Video generating method, apparatus, medium, and terminal

ABSTRACT

Video generating methods, apparatuses, media, and terminals are provided. The video generating method includes: acquiring a virtual viewpoint trajectory in response to a recording instruction, where the virtual viewpoint trajectory is a set of virtual viewpoints arranged according to a chronological order, and the viewpoint is selected from a multi-angle free-perspective range, and the multi-angle free-perspective range is a range supporting viewpoint switching viewing in the to-be-viewed area; and recording a rendered picture under the viewpoint trajectory, where the rendered picture under the viewpoint trajectory is a picture for viewing the to-be-viewed area according to the virtual viewpoint trajectory. Technical solutions in the example embodiments of the present disclosure may provide the user with the multi-angle degrees of freedom interactive experience.

CROSS REFERENCE TO RELATED PATENT APPLICATIONS

This application claims priority to the following Chinese PatentApplications: (1) CN201910177941.5, filed on 7 Mar. 2019, entitled“Method, Apparatus, Terminal, Capturing System, and Device for SettingCapturing Devices”, (2) CN201910172743.X, filed on 7 Mar. 2019, entitled“Method, Apparatus, Medium, and Device for Generating Multi-AngleFree-Perspective Image Data”, (3) CN201910172727.0, filed on 7 Mar.2019, entitled “Method, Apparatus, Medium, and Server for GeneratingMulti-angle Free-perspective Video Data”, (4) CN201910172742.5, filed on7 Mar. 2019, entitled “Method, Apparatus, Medium, Terminal, and Devicefor Processing Multi-Angle Free-Perspective Data”, (5) CN201910172729.X,filed on 7 Mar. 2019, entitled “Method, Apparatus, Medium, Terminal, andDevice for Multi-Angle Free-Perspective Interaction”, (6)CN201910173415.1, filed on 7 Mar. 2019, entitled “Method, Apparatus,Medium, Terminal, and Device for Multi-Angle Free-PerspectiveInteraction”, (7) CN201910173413.2, filed on 7 Mar. 2019, entitled“Method, Apparatus, Medium, and Device for Processing Multi-AngleFree-Perspective Image Data”, (8) CN201910173414.7, filed on 7 Mar.2019, entitled “Method, Apparatus, Medium, and Device for ProcessingMulti-Angle Free-Perspective Video Data”, (9) CN201910172761.8, filed on7 Mar. 2019, entitled “Video Generating Method, Apparatus, Medium, andTerminal”, (10) CN201910172717.7, filed on 7 Mar. 2019, entitled “VideoReconstruction Method, System, Device, and Computer Readable StorageMedium”, (11) CN201910172720.9, filed on 7 Mar. 2019, entitled “ImageReconstruction Method, System, Device, and Computer-Readable StorageMedium”, which are all hereby incorporated by reference in theirentirety.

The present disclosure relates to data processing methods, and inparticular, to video generating methods, apparatuses, media, andterminals.

BACKGROUND

In the field of data processing, applications for generating videos byrecording screens are becoming more and more widespread. However,usually, the video generated by such manner can only be based on theviewpoint provided by the original video or image.

SUMMARY

This Summary is provided to introduce a selection of implementations ina simplified form that are further described below in DetailedDescription. This Summary is not intended to identify all features ofthe claimed subject matter, nor is it intended to be used alone as anaid in determining the scope of the claimed subject matter. The term“techniques,” for instance, may refer to device(s), system(s), method(s)and/or processor-readable/computer-readable instructions as permitted bythe context above and throughout the present disclosure.

A technical problem solved by the example embodiments of the presentdisclosure is how to provide a method for generating aviewpoint-variable video.

In order to solve the above technical problem, example embodiments ofthe present disclosure provide a video generating method, including:acquiring a virtual viewpoint trajectory in response to a recordinginstruction, wherein the virtual viewpoint trajectory is a set ofvirtual viewpoints arranged according to a chronological order, and eachvirtual viewpoint of the set of virtual viewpoints is selected from amulti-angle free-perspective range, and the multi-angle free-perspectiverange is a range supporting viewpoint switching viewing in theto-be-viewed area; and recording a rendered picture under the viewpointtrajectory, wherein the rendered picture under the viewpoint trajectoryis a picture for viewing the to-be-viewed area according to the virtualviewpoint trajectory.

In an example embodiment, the picture for viewing the to-be-viewed areaaccording to the virtual viewpoint trajectory includes images forviewing the to-be-viewed area based on the virtual viewpoint displayedaccording to the chronological order, wherein the images are generatedbased on a data combination and the virtual viewpoint, and the datacombination includes pixel data and depth data of multiple synchronizedimages, and an associated relationship exists between image data anddepth data of each image, and the multiple synchronized images havedifferent perspectives with respect to the to-be-viewed area.

In an example embodiment, the images for viewing the to-be-viewed areabased on the virtual viewpoint include multiple frame images for viewingthe to-be-viewed area based on the virtual viewpoint.

In an example embodiment, recording the rendered picture under theviewpoint trajectory includes: acquiring a frame image displayed at eachframe moment according to a display frame rate.

In an example embodiment, the method further includes: compressing aframe image displayed at each frame moment in a video format.

In an example embodiment, acquiring the virtual viewpoint trajectoryincludes: receiving user instructions, and determining the virtualviewpoints according to the user instructions; and arranging the virtualviewpoints in a chronological order of receiving the user instructions,to generate the virtual viewpoint trajectory.

In an example embodiment, determining the virtual viewpoints accordingto the user instructions includes: determining a basic viewpoint forviewing the to-be-viewed area, where the basic viewpoint includes aposition and a perspective of the basic viewpoint; and determining thevirtual viewpoints based on the user instructions and an associationrelationship between the user instruction and a changing manner of thevirtual viewpoint based on the basic viewpoint, with the basic viewpointas a reference.

In an example embodiment, receiving user instructions includes:detecting a path of a touchpoint on a touch-sensitive screen, where thepath includes at least one of a start point, an end point, and a movingdirection of the touchpoint, with the path as the user instruction.

In an example embodiment, the association relationship between the pathand the changing manner of the virtual viewpoint based on the basicviewpoint includes: the number of paths is two, where a touchpoint of atleast one path in the two paths moves in a direction away from the otherparty, and a position of the virtual viewpoint moves in a directionclose to the to-be-viewed area.

In an example embodiment, the association relationship between the pathand the changing manner of the virtual viewpoint based on the basicviewpoint includes: the number of the paths is two, where a touchpointof at least one of the two paths moves in a direction close to the otherparty, and a position of the virtual viewpoint moves in a direction awayfrom the to-be-viewed area.

In an example embodiment, the association relationship between the pathand the changing manner of the virtual viewpoint based on the basicviewpoint includes: the number of the path is one, wherein a movingdistance of the touchpoint is associated with a magnitude of change ofthe perspective, and a moving direction of the touchpoint is associatedwith a direction of change of the perspective.

In an example embodiment, the user instructions include a voice controlinstruction.

In an example embodiment, the user instructions include a selection of apreset viewpoint for viewing the to-be-viewed area.

In an example embodiment, the preset viewpoint is taken as the virtualviewpoint.

In an example embodiment, the user instructions include a selection of aspecific object in the to-be-viewed area by the user.

In an example embodiment, prior to receiving the user instructions, themethod further includes: determining the specific object in theto-be-viewed area through an image recognition technology; and providinga selection option of the specific object.

In an example embodiment, the user instructions include at least one ofa position and a perspective of the virtual viewpoint.

In an example embodiment, the user instructions include a voice controlinstruction.

In an example embodiment, the user instructions include attitude changeinformation from at least one of a gyroscope and a gravity sensor.

In an example embodiment, the user instructions are received during aprocess of playing a video or displaying an image.

Example embodiments of the present disclosure further provide a videogenerating apparatus, including: a virtual viewpoint trajectoryacquiring unit, configured to acquire a virtual viewpoint trajectory inresponse to a recording instruction, wherein the virtual viewpointtrajectory is a set of virtual viewpoints arranged according to achronological order, and each virtual viewpoint of the set of virtualviewpoints is selected from a multi-angle free-perspective range, andthe multi-angle free-perspective range is a range supporting viewpointswitching viewing in the to-be-viewed area; and a rendered picturerecording unit, configured to record a rendered picture under theviewpoint trajectory, where the rendered picture under the viewpointtrajectory is a picture for viewing the to-be-viewed area according tothe virtual viewpoint trajectory.

Example embodiments of the present disclosure further provide acomputer-readable storage medium having computer instructions storedthereon, where when the computer instructions are executed, the steps ofthe video generating method are performed.

Example embodiments of the present disclosure further provide a terminalincluding a memory and a processor, the memory storing computerinstructions thereon capable of running on the processor, where when theprocessor executes the computer instructions, the steps of the videogenerating method are performed.

Compared with the conventional techniques, the technical solutions ofthe example embodiments of the present disclosure have the followingbeneficial effects:

By acquiring a virtual viewpoint trajectory in response to a recordinginstruction, and recording a rendered picture under the viewpointtrajectory, the generated video may correspond to different viewpoints,and the video content is more flexible. User operations that aresupported are more diversified, and thus user experience may be furtherimproved.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to illustrate the example embodiments of the present disclosuremore clearly, the drawings used in the description of the exampleembodiments will be briefly introduced below. Apparently, the drawingsin the following description represent some of the example embodimentsof the present disclosure, and other drawings may be obtained from thesedrawings by those skilled in the art without any creative efforts.

FIG. 1 is a schematic diagram of a to-be-viewed area in an exampleembodiment of the present disclosure;

FIG. 2 is a schematic diagram of a setting method of capturing devicesin an example embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a multi-angle free perspective displaysystem in an example embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a device display in an exampleembodiment of the present disclosure;

FIG. 5 is a schematic diagram of a control performed on a device in anexample embodiment of the present disclosure;

FIG. 6 is a schematic diagram of another control performed on a devicein an example embodiment of the present disclosure;

FIG. 7 is a schematic diagram of another setting method of capturingdevices in an example embodiment of the present disclosure;

FIG. 8 is a schematic diagram of another control performed on a devicein an example embodiment of the present disclosure;

FIG. 9 is a schematic diagram of another device display in an exampleembodiment of the present disclosure;

FIG. 10 is a flowchart of a setting method of capturing devices in anexample embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a multi-angle free-perspective rangein an example embodiment of the present disclosure;

FIG. 12 is a schematic diagram of another multi-angle free-perspectiverange in an example embodiment of the present disclosure;

FIG. 13 is a schematic diagram of another multi-angle free-perspectiverange in an example embodiment of the present disclosure;

FIG. 14 is a schematic diagram of another multi-angle free-perspectiverange in an example embodiment of the present disclosure;

FIG. 15 is a schematic diagram of another multi-angle free-perspectiverange in an example embodiment of the present disclosure;

FIG. 16 is a schematic diagram of another setting method of capturingdevices in an example embodiment of the present disclosure;

FIG. 17 is a schematic diagram of another setting method of capturingdevices in an example embodiment of the present disclosure;

FIG. 18 is a schematic diagram of another setting method of capturingdevices in an example embodiment of the present disclosure;

FIG. 19 is a flowchart of a method for generating multi-anglefree-perspective data in an example embodiment of the presentdisclosure;

FIG. 20 is a schematic diagram of distribution positions of the pixeldata and the depth data of a single image in an example embodiment ofthe present disclosure;

FIG. 21 is a schematic diagram of distribution positions of the pixeldata and the depth data of another single image in an example embodimentof the present disclosure;

FIG. 22 is a schematic diagram of distribution positions of the pixeldata and the depth data of an image in an example embodiment of thepresent disclosure;

FIG. 23 is a schematic diagram of distribution positions of the pixeldata and the depth data of another image in an example embodiment of thepresent disclosure;

FIG. 24 is a schematic diagram of distribution positions of the pixeldata and the depth data of another image in an example embodiment of thepresent disclosure;

FIG. 25 is a schematic diagram of distribution positions of the pixeldata and the depth data of another image in an example embodiment of thepresent disclosure;

FIG. 26 is a schematic diagram of image area stitching in an exampleembodiment of the present disclosure;

FIG. 27 is a schematic diagram of a structure of a stitched image in anexample embodiment of the present disclosure;

FIG. 28 is a schematic diagram of another structure of a stitched imagein an example embodiment of the present disclosure;

FIG. 29 is a schematic diagram of another structure of a stitched imagein an example embodiment of the present disclosure;

FIG. 30 is a schematic diagram of another structure of a stitched imagein an example embodiment of the present disclosure;

FIG. 31 is a schematic diagram of another structure of a stitched imagein an example embodiment of the present disclosure;

FIG. 32 is a schematic diagram of another structure of a stitched imagein an example embodiment of the present disclosure;

FIG. 33 is a schematic diagram of the pixel data distribution of animage in an example embodiment of the present disclosure;

FIG. 34 is a schematic diagram of another pixel data distribution of animage in an example embodiment of the present disclosure;

FIG. 35 is a schematic diagram of data storage in a stitched image in anexample embodiment of the present disclosure;

FIG. 36 is a schematic diagram of another data storage in a stitchedimage in an example embodiment of the present disclosure;

FIG. 37 is a flowchart of a method for generating multi-anglefree-perspective video data in an example embodiment of the presentdisclosure;

FIG. 38 is a flowchart of a method for processing multi-anglefree-perspective data in an example embodiment of the presentdisclosure;

FIG. 39 is a flowchart of a method for reconstructing an image for avirtual viewpoint in an example embodiment of the present disclosure;

FIG. 40 is a flowchart of a multi-angle free-perspective image dataprocessing method in an example embodiment of the present disclosure;

FIG. 41 is a flowchart of a method for processing multi-anglefree-perspective video data in an example embodiment of the presentdisclosure;

FIG. 42 is a flowchart of a multi-angle free-perspective interactionmethod in an example embodiment of the present disclosure;

FIG. 43 is a schematic diagram of another control performed on a devicein an example embodiment of the present disclosure;

FIG. 44 is a schematic diagram of another device display in an exampleembodiment of the present disclosure;

FIG. 45 is a schematic diagram of another control performed on a devicein an example embodiment of the present disclosure;

FIG. 46 is a schematic diagram of another device display in an exampleembodiment of the present disclosure;

FIG. 47 is a flowchart of another video generating method in an exampleembodiment of the present disclosure;

FIG. 48 is a structural schematic diagram of video generating apparatusin an example embodiment of the present disclosure;

FIG. 49 is a schematic structural diagram of a virtual viewpointtrajectory acquiring unit in an example embodiment of the presentdisclosure;

FIG. 50 is a schematic structural diagram of an instruction receivingsubunit in an example embodiment of the present disclosure;

FIG. 51 is a schematic diagram of a process for generating multi-anglefree-perspective data in an example embodiment of the presentdisclosure;

FIG. 52 is a schematic diagram of a multi-camera 6DoF capturing systemin an example embodiment of the present disclosure;

FIG. 53 is a schematic diagram of generating and processing of 6DoFvideo data in an example embodiment of the present disclosure;

FIG. 54 is a structural schematic diagram of the data header file in anexample embodiment of the present disclosure;

FIG. 55 is a schematic diagram of 6DoF video data processing on the userside in an example embodiment of the present disclosure;

FIG. 56 is a schematic diagram of input and output of a referencesoftware in an example embodiment of the present disclosure; and

FIG. 57 is a schematic diagram of an algorithm architecture of areference software in an example embodiment of the present disclosure.

DETAILED DESCRIPTION

To enable a person of ordinary skill in the art to better understand thesolutions of the present disclosure, hereinafter, technical solutions inthe example embodiments of the present disclosure will be clearly andthoroughly described with reference to the accompanying drawings in theexample embodiments of the present disclosure. Example embodimentsdescribed herein merely represent some of the example embodiments of thepresent disclosure. Other example embodiments obtained by a person ofordinary skill in the art based on the example embodiments of thepresent disclosure without making creative efforts should fall withinthe scope of the present disclosure.

As described above, usually, the video generated by screen recording canonly be based on the viewpoint provided by the original video or image.

In the example embodiments of the present disclosure, by acquiring avirtual viewpoint trajectory in response to a recording instruction, andrecording a rendered picture under the viewpoint trajectory, thegenerated video may correspond to different viewpoints, and the videocontent is more flexible. User operations that are supported are morediversified, and thus user experience may be further improved.

In order to make the above objectives, features, and beneficial effectsof the present disclosure more comprehensible, specific exampleembodiments of the present disclosure will be described in detailhereinafter with reference to the accompanying drawings.

As an example embodiment of the present disclosure, the applicantdescribes the following steps. The first step is capturing and depth mapcalculation, including three main steps, which respectively aremulti-camera video capturing, camera internal and external parametercalculation (camera parameter estimation), and depth map calculation.For multi-camera capturing, the videos captured by respective camerasare required to be aligned at the frame level. Referring to FIG. 51,through the multi-camera video capturing at 5102, a texture image may beobtained at 5104, i.e., the multiple synchronized images as describedhereinafter. Through the calculation of camera internal and externalparameters at 5106, camera parameters may be obtained at 5108, includinginternal parameter data and external parameter data as describedhereinafter. Through the depth map calculation at 5110, a depth map maybe obtained at 5112.

In this solution, no special camera, such as a light field camera, isrequired to capture the video. Similarly, no complicated cameracalibration is required before capturing. Positions of multiple camerasmay be laid out and arranged to better capture the objects or scenariosthat need to be captured. Referring to FIG. 52, multiple capturingdevices, such as camera 1 to camera N, may be set in the to-be-viewedarea.

After the above three steps are processed, the texture image capturedfrom multiple cameras, all camera parameters, and the depth map of eachcamera are obtained. These three pieces of data may be referred to asdata files in multi-angle free-perspective video data, and may also bereferred to as 6 degrees of freedom video data (6DoF video data) 5114.Because of these pieces of data, the user terminal may generate avirtual viewpoint based on the virtual 6 degrees of freedom (DoF)position, thereby providing a 6DoF video experience.

Referring to FIG. 53, 6DoF video data and indicative data (metadata) at5302 may be compressed and transmitted to the user side at 5304. Theuser side may obtain the user-side 6DoF expression at 5306 according tothe received data, i.e., the above 6DoF video data and metadata, wherethe indicative data may also be referred to as metadata.

Referring to FIG. 54, metadata may be used to describe the data patternof 6DoF video data, which may include stitching pattern metadata 5402,which is used to indicate storage rules of the pixel data and the depthdata of multiple images in the stitched image; padding pattern metadata5404, which may be used to indicate the padding pattern in the stitchedimage; and other metadata 5406. The metadata may be stored in the dataheader file, and the storage order may be as shown in FIG. 51, or may beother orders.

Referring to FIG. 55, the user terminal obtains 6DoF video data, whichincludes 6DoF position 5502, camera parameters 5504, the texture imageand the depth map 5506, and descriptive metadata (metadata) 5508, inaddition, interaction behavior data of the user terminal 5510. Withthese pieces of data, the user may use 6DoF rendering based on depthmap-based rendering (DIBR) 5512 to generate the virtual viewpoint imageat the 6DoF position generated according to the user behavior, that is,to determine the virtual viewpoint of the 6DoF position corresponding tothe instruction according to the user instruction.

In an example embodiment implemented during a test, each test exampleincludes 20 seconds of video data. The video data is 30 frames/secondwith a resolution of 1920*1080. For any one of the 30 cameras, there are600 frames of data in total. The main folder includes the texture imagefolder and the depth map folder. Under the texture image folder, thesecondary directories from 0 to 599 may be found. These secondarydirectories respectively represent 600 frames of content correspondingto the 20-second video. Each secondary directory includes texture imagescaptured by 30 cameras, named from 0.yuv to 29.yuv in the format ofyuv420. Accordingly, in the depth map folder, each secondary directoryincludes 30 depth maps calculated by the depth estimation algorithm.Each depth map corresponds to the texture image with the same name. Thetexture images and corresponding depth maps of multiple cameras belongto a certain frame moment in the 20-second video.

All depth maps in the test example are generated by a preset depthestimation algorithm. In the test, these depth maps may provide goodvirtual viewpoint reconstruction quality at the virtual 6DoF position.In one case, a reconstructed image of the virtual viewpoint may begenerated directly from the given depth maps. Alternatively, the depthmap may also be generated or improved by the depth calculation algorithmbased on the original texture image.

In addition to the depth map and the texture image, the test examplealso includes a .sfm file, which is used to describe the parameters ofall 30 cameras. The data of the .sfm file is written in binary format.The data format is described hereinafter. Considering the adaptabilityto different cameras, a fisheye camera model with distortion parameterswas used in the test. How to read and use camera parameter data from thefile may be understood with reference to DIBR reference softwareprovided by us. The camera parameter data includes the following fields:

(1) krt_R is the rotation matrix of the camera;(2) krt_cc is the optical center position of the camera;(3) krt_WorldPosition is the three-dimensional space coordinate of thecamera;(4) krt_kc is the distortion coefficient of the camera;(5) src_width is the width of the calibration image;(6) src_height is the height of the calibration image; and(7) fisheye_radius and lens_fov are parameters of the fisheye camera.

In the technical solutions implemented by the present disclosure, theuser may find the detailed code of how to read the correspondingparameters in the .sfm file from the preset parameter reading function(set_sfm_parameters function).

In the DIBR reference software, camera parameters, the texture image,the depth map, and the 6DoF position of the virtual camera are receivedas inputs, and the generated texture image and depth map at the virtual6DoF position are output at the same time. The 6DoF position of thevirtual camera is the above 6DoF position determined according to userbehavior. The DIBR reference software may be the software thatimplements image reconstruction based on the virtual viewpoint in theexample embodiments of the present disclosure.

Referring to FIG. 56, in the reference software, camera parameters 5602,the texture image 5604, the depth map 5606, and the 6DoF position of thevirtual camera 5608 are received as inputs, and generated texture image5610 and generated depth map 5612 at the virtual 6DoF position areoutput at the same time.

Referring to FIG. 57, the software may include the following processingsteps: camera selection 5702, forward projection of the depth map 5704and 5706, postprocessing of the depth map 5708 and 5710, backwardprojection of the texture image 5712 and 5714, fusion of multi-cameraprojected texture image 5716, and inpainting of the image 5718.

In the reference software, two cameras closest to the virtual 6DoFposition may be selected by default to generate the virtual viewpoint.

In the postprocessing step of the depth map, the quality of the depthmap may be improved by various methods, such as foreground padding,pixel-level filtering, and the like.

For the output generated image, a method for fusing texture images fromtwo cameras is used. The fusion weight is a global weight and isdetermined by the distance of the position of the virtual viewpoint fromthe position of the reference camera. When the pixel of the outputvirtual viewpoint image is projected to only one camera, the projectedpixel may be directly used as the value of the output pixel.

After the fusion step, if there are still hollow pixels that have notbeen projected to, an inpainting method may be used to fill the hollowpixels.

For the output depth map, for the convenience of errors and analysis, adepth map obtained by projecting from one of the cameras to the positionof the virtual viewpoint may be used as the output.

Additionally, 6DoF position of the virtual camera 5720 and cameraparameters 5722 may be used as the input for the camera selection step5702.

Those skilled in the art may understand that the above exampleembodiments are merely examples and are not limitations on theimplementation manners. The technical solutions in the presentdisclosure will be further described hereinafter.

Referring to FIG. 1, the to-be-viewed area may be a basketball court,and multiple capturing devices may be provided to perform data capturingon the to-be-viewed area.

For example, referring to FIG. 2, several capturing devices may be setalong a certain path at a height H_(LK) higher than the hoop. Forexample, six capturing devices may be set along the arc, i.e., thecapturing devices CJ₁ to CJ₆. Those skilled in the art may understandthat the setting position, number, and supporting manners of thecapturing devices may be various, and there is no limitation herein.

The capturing device may be a camera or a video camera capable ofsynchronous shooting, for example, a camera or a video camera capable ofsynchronous shooting through a hardware synchronization line. Withmultiple capturing devices capturing data in the to-be-viewed area,multiple images or video streams in synchronization may be obtained.According to the video streams captured by multiple capturing devices,multiple synchronized frame images may also be obtained as multiplesynchronized images. Those skilled in the art may understand that,ideally, the term synchronization refers to corresponding to the samemoment, but the existence of errors and deviations may also betolerated.

Referring to FIG. 3, in the example embodiments of the presentdisclosure, data may be captured in the to-be-viewed area through thecapturing system 31 including multiple capturing devices. The acquiredmultiple synchronized images may be processed by the capturing system 31or the server 32 to generate multi-angle free-perspective data which iscapable of supporting the device 33 that performs displaying to performvirtual viewpoint switching. The device 33 that performs displaying maydisplay the reconstructed image generated based on the multi-anglefree-perspective data. The reconstructed image corresponds to thevirtual viewpoint. According to the user instruction, reconstructedimages corresponding to different virtual viewpoints may be displayed,and the viewing position and viewing angle may be switched.

In implementations, the process of performing image reconstruction toobtain a reconstructed image may be implemented by the device 33 thatperforms displaying, or may be implemented by a device located on aContent Delivery Network (CDN) in an edge computing manner. Thoseskilled in the art may understand that FIG. 3 is merely an example, andis not a limitation on the capturing system, the server, the device thatperforms displaying, and the implementation manner. The process of imagereconstruction based on multi-angle free-perspective data will bedescribed in detail hereinafter with reference to FIG. 38 to FIG. 41 andwill not be repeated herein.

Referring to FIG. 4, following the previous example, the user may watchthe to-be-viewed area through the device that performs displaying. Inthis example embodiment, the to-be-viewed area is a basketball court. Asdescribed above, the viewing position and viewing angle may be switched.

For example, the user may slide the screen to switch the virtualviewpoint. In an example embodiment of the present disclosure, referringto FIG. 5, when the user slides the screen with his/her finger to theright, the virtual viewpoint for viewing may be switched. Stillreferring to FIG. 2, the position of the virtual viewpoint beforesliding may be VP₁. The position of the virtual viewpoint may be VP₂after the virtual viewpoint is switched by sliding the screen. Referringto FIG. 6, after sliding the screen, the reconstructed image displayedon the screen may be as shown in FIG. 6. The reconstructed image may beobtained by performing image reconstruction based on multi-anglefree-perspective data generated from data captured by multiple capturingdevices in an actual capturing scenario.

Those skilled in the art may understand that the image viewed beforeswitching may also be a reconstructed image. The reconstructed image maybe a frame image in a video stream. In addition, there are variousmanners to switch the virtual viewpoint according to the userinstruction, which is not limited herein.

In implementations, the virtual viewpoint may be represented by 6degrees of freedom (DoF) coordinates, where the spatial position of thevirtual viewpoint may be represented as (x, y, z), and the perspectivemay be represented as three directions of rotation (Θ,

, γ).

The virtual viewpoint is a three-dimensional concept. Three-dimensionalinformation is required to generate the reconstructed image. In animplementations manner, the multi-angle free-perspective data mayinclude the depth data for providing third-dimensional informationoutside the plane image. Compared with other implementation manners,such as providing three-dimensional information through point clouddata, the data amount of the depth data is smaller. Implementations ofgenerating multi-angle free-perspective data will be described in detailhereinafter with reference to FIG. 19 to FIG. 37 and will not berepeated herein.

In the example embodiments of the present disclosure, the switching ofthe virtual viewpoint may be performed within a certain range, which isthe multi-angle free-perspective range. That is, within the multi-anglefree-perspective range, the position of the virtual viewpoint and theperspective may be arbitrarily switched.

The multi-angle free-perspective range is related to the arrangement ofthe capturing devices. The broader the shooting coverage of thecapturing devices is, the larger the multi-angle free-perspective rangeis. The quality of the picture displayed by the device that performsdisplaying is related to the number of capturing devices. Generally, themore the number of capturing devices is set, the fewer the number of thehollow areas in the displayed picture is.

Referring to FIG. 7, if two rows (an upper row and a lower row) ofcapturing devices are set in the basketball court, i.e., the upper rowof capturing devices CJ₁ to CJ₆ and the lower row of capturing devicesCJ₁₁ to CJ₁₆, respectively, compared with setting only one row ofcapturing devices, the multi-angle free-perspective range thereof isgreater.

Referring to FIG. 8, the user's finger may slide upward to switch thevirtual viewpoint for viewing. Referring to FIG. 9, after sliding thescreen, the image displayed on the screen may be as shown in FIG. 9.

In implementations, if only one row of capturing devices is set, acertain degree of freedom in the vertical direction may also be obtainedin the process of image reconstruction to obtain the reconstructedimage, but the multi-angle free-perspective range thereof is smallerthan that of the scenario where two rows of capturing devices are set inthe vertical direction.

It may be understood by those skilled in the art that the aboverespective example embodiments and corresponding drawings are merely forillustrative purposes and are not intended to limit the associationrelationship between the setting of the capturing devices and themulti-angle free-perspective range, nor are they limitations ofoperation manners or obtained display effects of the device thatperforms displaying. According to the user instruction, implementationsof the virtual viewpoint switching viewing of the to-be-viewed area willbe described in detail hereinafter with reference to FIG. 43 to FIG. 47and will not be repeated herein.

Hereinafter, a setting method of capturing devices is further described.

FIG. 10 is a flowchart of a setting method 1000 of capturing devices inan example embodiment of the present disclosure, which may include thefollowing steps:

Step S1002, determining a multi-angle free-perspective range, wherevirtual viewpoint switching viewing in the to-be-viewed area issupported within the multi-angle free-perspective range;

Step S1004, determining setting positions of the capturing devicesaccording to at least the multi-angle free-perspective range, where thesetting positions are suitable for setting the capturing devices toperform data capturing in the to-be-viewed area.

Those skilled in the art may understand that a completely freeperspective may refer to a perspective with 6 degrees of freedom. Thatis, the user may freely switch the spatial position and perspective ofthe virtual viewpoint on the device that performs displaying, where thespatial position of the virtual viewpoint may be expressed as (x, y, z),and the perspective may be expressed as three directions of rotation (Θ,

, γ). There are 6 degrees of freedom in total, and thus the perspectiveis referred to as a perspective with 6 degrees of freedom.

As described above, in the example embodiments of the presentdisclosure, the switching of the virtual viewpoint may be performedwithin a certain range, which is the multi-angle free-perspective range.That is, within the multi-angle free-perspective range, the position ofthe virtual viewpoint and the perspective may be arbitrarily switched.

The multi-angle free-perspective range may be determined according tothe needs of the application scenario. For example, in some scenarios,the to-be-viewed area may have a core focus, such as the center of thestage, or the center of the basketball court, or the hoop of thebasketball court. In such scenarios, the multi-angle free-perspectiverange may include a planar or three-dimensional area including the corefocus. Those skilled in the art may understand that the to-be-viewedarea may be a point, a plane, or a three-dimensional area, which is notlimited herein.

As described above, the multi-angle free-perspective range may bevarious areas, and further examples are described hereinafter withreference to FIG. 11 to FIG. 15.

Referring to FIG. 11, point O represents the core focus. The multi-anglefree-perspective range may be a sector area with the core focus as thecenter and located in the same plane as the core focus, such as thesector area A₁OA₂, or the sector area B₁OB₂. The multi-anglefree-perspective range may also be a circular plane centered at point O.

Taking the multi-angle free-perspective range as the sector area A₁OA₂as an example, the position of the virtual viewpoint may be continuouslyswitched in this area. For example, the position of the virtualviewpoint may be continuously switched from A₁ along the arc segmentA₁A₂ to A₂. Alternatively, the position of the virtual viewpoint mayalso be continuously switched along the arc segment L₁L₂. Alternatively,the position is switched in the multi-angle free-perspective range inother manners. Accordingly, the perspective of the virtual viewpoint mayalso be changed in this area.

Further referring to FIG. 12, the core focus may be the center point Eof the basketball court. The multi-angle free-perspective range may be asector area with the center point E as the center and located in thesame plane as the center point E, such as the sector area F₁₂₁EF₁₂₂. Thecenter point E of the basketball court may be located on the ground ofthe court. Alternatively, the center point E of the basketball court maybe at a certain height from the ground. The height of the arc endpointF₁₂₁ and the height of the arc endpoint F₁₂₂ of the sector area may bethe same, for example, the height H121 in the figure.

Referring to FIG. 13, the core focus is represented by point O. Themulti-angle free-perspective range may be a part of a sphere centered onthe core focus. For example, the area C₁C₂C₃C₄ is used to illustrate apartial area of the spherical surface, and the multi-anglefree-perspective range may be a three-dimensional range formed by thearea C₁C₂C₃C₄ and the point O. Any point within this range may be usedas the position of the virtual viewpoint.

Further referring to FIG. 14, the core focus may be the center point Eof the basketball court. The multi-angle perspective range may be a partof the sphere centered on the center point E. For example, the areaF₁₃₁F₁₃₂F₁₃₃F₁₃₄ illustrates a partial area of the spherical surface.The multi-angle free-perspective range may be a three-dimensional rangeformed by the area F₁₃₁F₁₃₂F₁₃₃F₁₃₄ and the center point E.

In the scenario with the core focus, the position of the core focus maybe various, and the multi-angle free-perspective range may also bevarious, which are not listed herein one by one. Those skilled in theart may understand that the above respective example embodiments aremerely examples and are not limitations on the multi-anglefree-perspective range. Moreover, the shapes shown therein are notlimitations on actual scenarios and applications.

In implementations, the core focus may be determined according to thescenario. In a shooting scenario, there may also be multiple corefocuses, and the multi-angle free-perspective range may be asuperposition of multiple sub-ranges.

In other application scenarios, the multi-angle free-perspective rangemay also be without the core focus. For example, in some applicationscenarios, it is necessary to provide multi-angle free-perspectiveviewing of historic buildings, or to provide multi-anglefree-perspective viewing of art exhibitions. Accordingly, themulti-angle free-perspective range may be determined according to therequirements of these scenarios.

Those skilled in the art may understand that the shape of the degree offreedom perspective range may be arbitrary. Any point within themulti-angle free-perspective range may be used as the position.

Referring to FIG. 15, the multi-angle free-perspective range may be thecube D₁D₂D₃D₄D₆D₆D₇D₈, and the to-be-viewed area is the surfaceD₁D₂D₃D₄. Then, any point in the cube D₁D₂D₃D₄D₆D₆D₇D₈ may be used asthe position of the virtual viewpoint. The perspective of the virtualviewpoint, i.e., the viewing angle, may be various. For example, theposition E₆ on the surface D₆D₆D₇D₈ may be selected to view with theperspective of E₆D₁ or to view along the angle of E₆D₉, where the pointD₉ is selected from the to-be-viewed area.

In implementations, after the multi-angle free-perspective range isdetermined, the positions of the capturing devices may be determinedaccording to the multi-angle free-perspective range.

In an example embodiment, the setting positions of the capturing devicesmay be selected within the multi-angle free-perspective range. Forexample, the setting positions of the capturing devices may bedetermined at boundary points of the multi-angle free-perspective range.

Referring to FIG. 16, the core focus may be the center point E of thebasketball court, and the multi-angle free-perspective range may be thesector area with the center point E as the center and located in thesame plane as the center point E, such as the sector area F₆₁EF₆₂. Thecapturing devices may be set inside the multi-angle perspective range,for example, along the arc F₆₅F₆₆. Areas that are not covered by thecapturing devices may be reconstructed using algorithms. Inimplementations, the capturing devices may also be set along the arcF₆₁F₆₂, and the capturing devices may be set at the ends of the arc toimprove the quality of the reconstructed image. Each capturing devicemay be set towards the center point E of the basketball court. Theposition of the capturing device may be represented by spatial positioncoordinates, and the orientation of the capturing device may berepresented by three rotation directions.

In implementations, two or more setting positions may be set, andcorrespondingly, two or more capturing devices may be set. The number ofcapturing devices may be determined according to the requirements of thequality of the reconstructed image or video. In a scenario with a higherrequirement on the picture quality of the reconstructed image or video,the number of capturing devices may be greater. In a scenario with alower requirement on the picture quality of the reconstructed image orvideo, the number of capturing devices may be smaller.

Still referring to FIG. 16, those skilled in the art may understand thatif the higher picture quality of reconstructed image or video and areduction in the number of holes in the reconstructed image are pursued,a larger number of capturing devices may be set along the arc F₆₁F₆₂.For example, 40 cameras may be set.

Referring to FIG. 17, the core focus may be the center point E of thebasketball court, and the multi-angle perspective range may be a part ofthe sphere centered on the center point E. For example, the areaF₆₁F₆₂F₆₃F₆₄ illustrates a partial area of the spherical surface, andthe multi-angle free-perspective range may be a three-dimensional rangeformed by the area F₆₁F₆₂F₆₃F₆₄ and the center point E. The capturingdevices may be set inside the multi-angle perspective range, forexample, along the arc F₆₅F₆₆ and the arc F₆₇F₆₈. Similar to theprevious example, areas that are not covered by the capturing devicesmay be reconstructed using algorithms. In implementations, the capturingdevices may also be set along the arc F₆₁F₆₂ and the arc F₆₃F₆₄, and thecapturing devices may be set at the ends of the arc to improve thequality of the reconstructed image.

Each capturing device may be set to face the center point E of thebasketball court. Those skilled in the art may understand that, althoughnot being shown in the figure, the number of capturing devices along thearc F₆₁F₆₂ may be more than the number of capturing devices along thearc F₆₃F₆₄.

As described above, in some application scenarios, the to-be-viewed areamay include the core focus. Accordingly, the multi-anglefree-perspective range includes the area where the perspective isdirected to the core focus. In such an application scenario, the settingpositions of the capturing devices may be selected from an arc-shapedarea whose concave direction (radius direction) points to the corefocus.

When the to-be-viewed area includes the core focus, the settingpositions are selected in the arc-shaped area pointing to the core focusin the concave direction, so that the capturing devices are arrangedwith an arc shape. Because the to-be-viewed area includes the corefocus, the perspective points to the core focus. In such a scenario, thecapturing devices are arranged with the arc shape, such that fewercapturing devices may be used to cover a larger multi-anglefree-perspective range.

In implementations, the setting positions of the capturing devices maybe determined with reference to the perspective range and the boundaryshape of the to-be-viewed area. For example, the setting positions ofthe capturing devices may be determined at a preset interval along theboundary of the to-be-viewed area within the perspective range.

Referring to FIG. 18, the multi-angle perspective range may be withoutthe core focus. For example, the position of the virtual viewpoint maybe selected from the hexahedron F₈₁F₈₂F₈₃F₈₄F₈₅F₈₆F₈₇F₈₈, and thevirtual viewpoint position is used for viewing the to-be-viewed area.The boundary of the to-be-viewed area may be the ground boundary of thecourt. The capturing devices may be set along the intersecting lineB₈₉B₉₄ of the ground boundary line with the to-be-viewed area. Forexample, six capturing devices may be set at positions B₈₉ to B₉₄. Thedegree of freedom in the up and down direction may be realized by analgorithm. Alternatively, another row of capturing devices may be set atthe positions where the horizontal projection positions thereof are inthe intersection line B₈₉ to B₉₄.

In implementations, the multi-angle free-perspective range may alsosupport viewing from the upper side of the to-be-viewed area, and theupper side is in a direction away from the horizontal plane.

Accordingly, the capturing device may be mounted on the drone to set thecapturing device on the upper side of the to-be-viewed area, or on thetop of the building where the to-be-viewed area is located. The top ofthe building is the structure in the direction away from the horizontalplane.

For example, the capturing device may be set on the top of thebasketball stadium, or may hover on the upper side of the basketballcourt through the drone carrying the capturing device. The capturingdevice may be set on the top of the stadium where the stage is located,or may be carried by the drone.

By setting the capturing device on the upper side of the to-be-viewedarea, the multi-angle free-perspective range may include the perspectiveabove the to-be-viewed area.

In implementations, the capturing device may be a camera or a videocamera, and the captured data may be pictures or video data.

Those skilled in the art may understand that the manner in which thecapturing device is set at the setting position may be various. Forexample, the capturing device may be supported by the support frame atthe setting position, or in other setting manners.

In addition, those skilled in the art may understand that the aboverespective example embodiments are merely examples for illustration, andare not limitations on the setting manner of capturing devices. Invarious application scenarios, the implementations of determining thesetting positions of the capturing devices and setting the capturingdevices for capturing according to the multi-angle free-perspectiverange are all within the protection scope of the present disclosure.

Hereinafter, the method for generating multi-angle free-perspective datais further described.

As described above, still referring to FIG. 3, the acquired multiplesynchronized images may be processed by the capturing system 31 or theserver 32 to generate multi-angle free-perspective data that is capableof supporting the device 33 that performs displaying to switch thevirtual viewpoint. The multi-angle free-perspective data may indicatethe third-dimension information outside the two-dimensional imagethrough the depth data.

In an example embodiment, referring to FIG. 19, generating themulti-angle free-perspective data may include the following steps:

Step S1902, acquiring multiple synchronized images, where the shootingangles of the multiple images are different;

Step S1904, determining the depth data of each image based on themultiple images;

Step S1906, for each of the images, storing the pixel data of each imagein a first field, and storing the depth data in at least a second fieldassociated with the first field.

The multiple synchronized images may be images captured by the camera orframe images in video data captured by the video camera. In the processof generating the multi-angle free-perspective data, the depth data ofeach image may be determined based on the multiple images.

The depth data may include a depth value corresponding to a pixel of theimage. The distance from the capturing device to each point in theto-be-viewed area may be used as the above depth value, and the depthvalue may directly reflect the geometry of the visible surface in theto-be-viewed area. The depth value may be the distance from respectivepoints in the to-be-viewed area along the optical axis of the camera tothe optical center, and the origin of the camera coordinate system maybe used as the optical center. Those skilled in the art may understandthat the distance may be a relative value, and multiple images may bebased on the same reference.

Further, the depth data may include depth values corresponding to thepixels of the image on a one-to-one basis. Alternatively, the depth datamay be some values selected from a set of depth values corresponding tothe pixels of the image on a one-to-one basis.

Those skilled in the art may understand that the set of depth values maybe stored in the form of a depth map. In implementations, the depth datamay be data obtained by down-sampling the original depth map. The imageform where the set of depth values corresponding to the pixels of theimage on a one-to-one basis is stored according to the arrangement ofpixel points of the image is the original depth map.

In implementations, the pixel data of the image stored in the firstfield may be original image data, such as data obtained from thecapturing device, or may be data with a reduced resolution of theoriginal image data. Further, the pixel data of the image may beoriginal the pixel data of the image, or the pixel data with reducedresolution. The pixel data of the image may be any one of YUV data andRGB data, or may be other data capable of expressing the image.

In implementations, the amount of the depth data stored in the secondfield may be the same as or different from the amount of pixel pointscorresponding to the pixel data of the image stored in the first field.The amount may be determined according to the bandwidth limitation ofdata transmission of the device terminal that processes the multi-anglefree-perspective image data. If the bandwidth is small, the amount ofdata may be reduced in the above manners such as down-sampling orresolution reduction, and the like.

In implementations, for each of the images, the pixel data of the imagemay be sequentially stored in multiple fields in a preset order, andthese fields may be consecutive or may be distributed in an interleavingmanner with the second field. The fields storing the pixel data of theimage may be used as the first fields. Hereinafter, examples areprovided for explanation.

Referring to FIG. 20, the pixel data of an image that is represented bypixel 1 to pixel 6 and other pixels not shown in the figure, may bestored in multiple consecutive fields in a preset order. Theseconsecutive fields may be used as the first fields. The depth datacorresponding to the image that is represented by depth value 1 to depthvalue 6 and other depth values not shown in the figure, may be stored inmultiple consecutive fields in a preset order. These consecutive fieldsmay be used as the second fields. The preset order may be a storingperformed line by line sequentially according to the distributionpositions of the image pixels, or may be other orders.

Referring to FIG. 21, the pixel data and corresponding depth values ofan image may also be stored in multiple fields alternately. Multiplefields storing the pixel data may be used as the first fields, andmultiple fields storing the depth values may be used as the secondfields.

In implementations, the depth data may be stored in the same order asthe pixel data of the image, so that a respective field in the firstfields may be associated with a respective field in the second fields,thereby reflecting the depth value corresponding to each pixel.

In implementations, the pixel data and the depth data of multiple imagesmay be stored in various ways. Hereinafter, examples are provided forfurther explanation.

Referring to FIG. 22, respective pixels of image 1 are represented byimage 1 pixel 1, image 1 pixel 2, and other pixels not shown in thefigure, and may be stored in consecutive fields, which may be used asthe first fields. The depth data of image 1 is represented by image 1depth value 1, image 1 depth value 2, and the other depth data not shownin the figure, and may be stored in the fields adjacent to the firstfields. These fields may be used as the second fields. Similarly, thepixel data of image 2 may be stored in the first fields, and the depthdata of image 2 may be stored in the adjacent second fields.

Those skilled in the art may understand that respective images in theimage stream or respective frame images in the video stream that arecontinuously captured by one capturing device of multiple synchronizedcapturing devices may be used as the above image 1 respectively.Similarly, among the multiple synchronized capturing devices, the imagecaptured in synchronization with image 1 may be used as image 2. Thecapturing device may be the capturing device shown in FIG. 2, orcapturing devices in other scenarios.

Referring to FIG. 23, the pixel data of image 1 and the pixel data ofimage 2 may be stored in multiple adjacent first fields, and the depthdata of image 1 and the depth data of image 2 may be stored in multipleadjacent second fields.

Referring to FIG. 24, the pixel data of each image in the multipleimages may be stored in multiple fields respectively, and these fieldsmay be used as the first fields. Fields storing the pixel data may beinterleaved with fields storing the depth values.

Referring to FIG. 25, the pixel data and the depth values of differentimages may also be arranged in the interleaving manner. For example,image 1 pixel 1, image 1 depth value 1, image 2 pixels 1, image 2 depthvalue 1, . . . may be sequentially stored until the completion ofstoring the pixel data and the depth data corresponding to the firstpixel of each image of the multiple images. The adjacent fields thereofstore image 1 pixel 2, image 1 depth value 2, image 2 pixel 2, image 2depth value 2, . . . until the completion of storing of the pixel dataand the depth data of each image.

In summary, the fields storing the pixel data of each image may be usedas the first fields, and the fields storing the depth data of the imagemay be used as the second fields. For each image, the first fields andthe second fields associated with the first fields may be storedrespectively.

Those skilled in the art may understand that the above respectiveexample embodiments are merely examples, and are not specificlimitations on the type, size, and arrangement of the fields.

Referring to FIG. 3, the multi-angle free-perspective data including thefirst fields and the second fields may be stored in a server 32 in thecloud, transmitted to the CDN or to the device 33 that performsdisplaying, for reconstructing the image.

In implementations, both the first fields and the second fields may bepixel fields in the stitched image. The stitched image is used to storethe pixel data and the depth data of the multiple images. By using imageformat for data storage, the amount of data may be reduced, the timelength of data transmission may be reduced, and the resource occupationmay be reduced.

The stitched image may be an image in various formats such as BMPformat, JPEG format, PNG format, and the like. These image formats maybe the compressed format or the uncompressed format. Those skilled inthe art may understand that the image in various formats may includefields corresponding to respective pixels, which are referred to aspixel fields. The size of the stitched image, i.e., parameters like thenumber of pixels and the aspect ratio of the stitched image, may bedetermined according to the needs, for example, may be determined basedon the number of the multiple synchronized images, the amount of data tobe stored in each image, the amount of the depth data to be stored ineach image, and other factors.

In implementations, among the multiple synchronized images, the depthdata corresponding to the pixels of each image and the number of bits ofthe pixel data may be associated with the format of the stitched image.

For example, when the format of the stitched image is the BMP format,the range of the depth value may be 0-255, which is 8-bit data, and thedata may be stored as the gray value in the stitched image.Alternatively, the depth value may also be 16-bit data, which may bestored as the gray value at two pixel positions in the stitched image,or stored in two channels at one pixel position in the stitched image.

When the format of the stitched image is the PNG format, the depth valuemay also be 8-bit or 16-bit data. In the PNG format, the depth value of16-bit may be stored as the gray value of one pixel position in thestitched image.

Those skilled in the art may understand that the above exampleembodiments are not limitations on the storage manner or the number ofdata bits, and other data storage manners that may be implemented bythose skilled in the art fall within the protection scope of the presentdisclosure.

In implementations, the stitched image may be split into an image areaand a depth map area. The pixel fields of the image area store the pixeldata of the multiple images, and the pixel fields of the depth map areastore the depth data of the multiple images. The pixel fields storingthe pixel data of each image in the image area are used as the firstfields, and the pixel fields storing the depth data of each image in thedepth map area are used as the second fields.

In implementations, the image area may be a continuous area, and thedepth map area may also be a continuous area.

Further, in implementations, the stitched image may be equally split,and the two split parts are used as the image area and the depth maparea respectively. Alternatively, the stitched image may also be splitin an unequal manner according to the amount of the pixel data and theamount of the depth data of the image to be stored.

For example, referring to FIG. 26, one pixel is represented by eachminimum square, then the image area may be area 1 within the dashedframe, i.e., the upper half area after the stitched image is splitequally up and down. The lower half area of the stitched image may beused as the depth map area.

Those skilled in the art may understand that FIG. 26 is merely forillustration, and the number of the minimum squares therein is not alimitation on the number of pixels of the stitched image. In addition,the method of equal splitting may be equally splitting the stitchedimage left and right.

In implementations, the image area may include multiple image sub-areas.Each image sub-area is used to store one of the multiple images. Thepixel fields of each image sub-area may be used as the first fields.Accordingly, the depth map area may include multiple depth mapsub-areas. Each depth map sub-area is used to store the depth data ofone of the multiple images. The pixel fields of each depth map sub-areamay be used as the second fields.

The number of image sub-areas and the number of depth map sub-areas maybe equal, both of which are equal to the number of multiple synchronizedimages. In other words, the number of image sub-areas and the number ofdepth map sub-areas may be equal to the number of cameras describedabove.

Referring to FIG. 27, equally splitting the stitched image up and downis still taken as an example for further description. The upper half ofthe stitched image in FIG. 27 is the image area, which is split intoeight image sub-areas, which store the pixel data of the synchronizedeight images respectively. Each image has a different shooting angle,i.e., a different perspective. The lower half of the stitched image isthe depth map area, which is split into 8 depth map sub-areas, whichstore the depth maps of the 8 images respectively.

With reference to the descriptions above, the pixel data of thesynchronized 8 images, i.e., perspective 1 image to perspective 8 image,may be the original images obtained from the cameras, or may be imagesafter the original images are reduced in resolution. The depth data isstored in a partial area of the stitched image and may also be referredto as the depth map.

As described above, in implementations, the stitched image may also besplit in an unequal manner. For example, referring to FIG. 28, thenumber of pixels occupied by the depth data may be less than the numberof pixels occupied by the pixel data of the image. Then, the image areaand the depth map area may have different sizes. For example, the depthdata may be obtained by quarter-down-sampling the depth map, and asplitting manner as shown in FIG. 28 may be used. The number of pixelsoccupied by the depth map may also be greater than the number of pixelsoccupied by the pixel data of the image.

Those skilled in the art may understand that FIG. 28 is not a limitationon the splitting of the stitched images in the unequal manner. Inimplementations, the number of pixels and the aspect ratio of thestitched image may be various, and the splitting manner may also bevarious.

In implementations, the image area or the depth map area may alsoinclude multiple areas. For example, as shown in FIG. 29, the image areamay be a continuous area, and the depth map area may include twocontinuous areas.

Alternatively, referring to FIG. 30 and FIG. 31, the image area mayinclude two continuous area, and the depth map area may also include twocontinuous areas. The image areas and the depth areas may be arranged inthe interleaving manner.

Alternatively, referring to FIG. 32, the image sub-areas included in theimage area may be arranged in the interleaving manner with the depth mapsub-areas included in the depth map area. The number of continuous areasincluded in the image area may be equal to the number of imagesub-areas, and the number of continuous areas included in the depth maparea may be equal to the number of sub-areas in the depth map.

In implementations, the pixel data of each image may be stored in theimage sub-areas in the order of the arrangement of pixel points. Thedepth data of each image may also be stored in the depth map sub-areasin the order of the arrangement of pixel points.

Referring to FIG. 33 to FIG. 35, FIG. 33 illustrates image 1 with 9pixels, and FIG. 34 illustrates image 2 with 9 pixels, where image 1 andimage 2 are two synchronized images with different angles. According toimage 1 and image 2, the depth data corresponding to image 1 may beobtained, including image 1 depth value 1 to image 1 depth value 9.Also, the depth data corresponding to image 2 may be obtained, includingimage 2 depth value 1 to image 2 depth value 9.

Referring to FIG. 35, when image 1 is stored in the image sub-areas,image 1 may be stored in the upper-left image sub-area in the order ofthe arrangement of pixel points. That is, in the image sub-areas, thearrangement of pixel points may be the same as image 1. When image 2 isstored in the image sub-areas, similarly, image 2 may be stored in theupper-right image sub-areas in this manner.

Similarly, when the depth data of image 1 is stored into the depth mapsub-areas, image 1 may be stored in a similar manner. In the case wherethe depth value corresponds to the pixel value of the image on aone-to-one basis, the depth data of image 1 may be stored in a manner asshown in FIG. 35. If the depth values are obtained after down-samplingthe original depth map, the depth data of image 1 may be stored in thedepth map sub-areas in the order of the arrangement of pixel points ofthe depth map obtained after the down-sampling.

Those skilled in the art may understand that the compression ratio ofcompressing the image is related to the association of respective pixelpoints in the image. The stronger the association is, the higher thecompression ratio is. Since the captured image corresponds to the realworld, the association of respective pixel points is strong. By storingthe pixel data and the depth data of the image in the order of thearrangement of pixel points, the compression ratio when compressing thestitched image may be higher. That is, the amount of data aftercompression may be made smaller if the amount of data before compressionis the same.

By splitting the stitched image into the image area and the depth maparea, in the case where multiple image sub-areas are adjacent in theimage area or multiple depth map sub-areas are adjacent in the depth maparea, since the data stored in the respective image sub-areas isobtained from images or frame images in the videos taken from differentangles of the to-be-viewed area, all the depth maps are stored in thedepth map area, and thus when the stitched image is compressed, a highercompression ratio may also be obtained.

In implementations, padding may be performed on all or some of the imagesub-areas and the depth map sub-areas. The form of padding may bevarious. For example, taking perspective 1 depth map in FIG. 31 as anexample, redundant pixels may be set around the original perspective 1depth map. Alternatively, the number of pixels in the originalperspective 1 depth map may be maintained, while redundant pixels whichdo not actually store the pixel data are reserved around the originalperspective 1 depth map, and the original perspective 1 depth map isreduced and stored in the remaining pixels. Alternatively, other mannersmay be used to make redundant pixels set aside between perspective 1depth map and other surrounding images finally.

Because the stitched image includes multiple images and depth maps, theassociation between adjacent borders of respective images is poor. Byperforming padding, quality loss of the images and the depth maps in thestitched image may be reduced when the stitched image is compressed.

In implementations, the pixel field of the image sub-area may storethree-channel data, and the pixel field of the depth map sub-area maystore single-channel data. The pixel field of the image sub-area is usedto store the pixel data of any one of the multiple synchronized images.The pixel data is usually three-channel data, such as RGB data or YUVdata.

The depth map sub-areas are used to store the depth data of the image.If the depth value is 8-bit binary data, a single channel of the pixelfield may be used for storage. If the depth value is 16-bit binary data,two channels of the pixel field may be used for storage. Alternatively,the depth value may also be stored with a larger pixel area. Forexample, if the multiple synchronized images are all 1920*1080 imagesand the depth values are 16-bit binary data, the depth values may alsobe stored in a doubled 1920*1080 image area, where each image area isstored with the single channel. The stitched image may also be split incombination with the storage manner.

The uncompressed amount of data of the stitched image is stored in sucha way that each channel of each pixel occupies 8 bits, which may becalculated according to the following formula, i.e., the number of themultiple synchronized images*(the amount of data of the pixel data ofthe image+the amount of data of the depth map).

If the original image has a resolution of 1080P, i.e., 1920*1080 pixels,with a progressive scan format, the original depth map may also occupy1920*1080 pixels, which is the single channel. The amount of data ofpixels of the original image is 1920*1080*8*3 bits, and the amount ofdata of the original depth map is 1920*1080*8 bits. If the number ofcameras is 30, the amount of data of pixels of the stitched image is30*(1920*1080*8*3+1920*1080*8) bits, which is about 237M. If notcompressed, the stitched image will occupy a lot of system resources andhave a large delay. Especially when the bandwidth is small, for example,when the bandwidth is 1 Mbps, the uncompressed stitched image needsabout 237 seconds to be transmitted. The real-time performance is poor,and the user experience needs to be improved.

By one or more of manners such as storing regularly to obtain a highercompression ratio, reducing the resolution of the original image, orusing the pixel data with reduced resolution as the pixel data of theimage, or performing down-sampling on one or more of the original depthmaps, and the like, the amount of data of stitched image may be reduced.

For example, if the resolution of the original image is 4K, i.e., thepixel resolution of 4096*2160, and the down-sampling has a resolution of540P, i.e., the pixel resolution of 960*540, the number of pixels of thestitched image is approximately one-sixteenth of the number of pixelsbefore down-sampling. In combination with any one or more of othermanners for reducing the amount of data described above, the amount ofdata may be made smaller.

Those skilled in the art may understand that if the bandwidth issupportive and the decoding capability of the device that performs dataprocessing may support the stitched image with higher resolution, thestitched image with higher resolution may also be generated to improvethe image quality.

Those skilled in the art may understand that in different applicationscenarios, the pixel data and the depth data of the multiplesynchronized images may also be stored in other manners, for example,stored in the stitched image in units of pixel points. Referring to FIG.33, FIG. 34, and FIG. 36, image 1 and image 2 shown in FIG. 33 and FIG.34 may be stored in the stitched image in the manner of FIG. 36.

In summary, the pixel data and the depth data of the image may be storedin the stitched image. The stitched image may be split into the imagearea and the depth map area in various manners. Alternatively, the pixeldata and the depth data of the stitched image may be stored in a presetorder without splitting.

In implementations, the multiple synchronized images may also bemultiple synchronized frame images obtained by decoding multiple videos.The videos may be acquired by multiple cameras, and the settings thereofmay be the same as or similar to the cameras that acquire the images asdescribed above.

In implementations, generating the multi-angle free-perspective imagedata may further include generating the association relationship field,and the association relationship field may indicate the associationrelationship between the first field and at least one second field. Thefirst field stores the pixel data of one of the multiple synchronizedimages, and the second field stores the depth data corresponding to theimage, where the first field and the second field correspond to the sameshooting angle, i.e., the same perspective. The association relationshipbetween the first field and the second field may be described by theassociation relationship field.

Taking FIG. 27 as an example, the area where perspective 1 image toperspective 8 image are stored in FIG. 27 includes 8 first fields, andthe area where perspective 1 depth map to perspective 8 depth map arestored includes 8 second fields. There is an association relationshipbetween the first field of perspective 1 image and the second field ofperspective 1 depth map. Similarly, there is an association relationshipbetween the field storing the perspective 2 image and the field storingthe perspective 2 depth map.

The association relationship field may indicate the associationrelationship between the first field and the second field of each imageof the multiple synchronized images in various manners, for example, maybe content storage rules of the pixel data and the depth data of themultiple synchronized images, that is, indicating the associationrelationship between the first field and the second field throughindicating the storage manner described above.

In implementations, the association relationship field may only includedifferent mode numbers. The device that performs data processing maylearn the storage manner of the pixel data and the depth data in theobtained multi-angle free-perspective image data according to the modenumber of the field and the data stored in the device that performs dataprocessing. For example, if the received mode number is 1, the storagemanner is parsed as follows. The stitched image is equally split intotwo areas up and down, where the upper half area is the image area, andthe lower half area is the depth map area. The image at a certainposition in the upper half area is associated with the depth map storedat the corresponding position in the lower half area.

Those skilled in the art may understand that the manner of storing thestitched image in the above example embodiments, for example, thestorage manners illustrated in FIG. 27 to FIG. 36, may be described bycorresponding association relationship field, so that the device thatperforms data processing may obtain the associated image and the depthdata according to the association relationship field.

As described above, the picture format of the stitched image may be anyone of the image formats such as BMP, PNG, JPEG, Webp and the like, orother image formats. The storage manner of the pixel data and the depthdata in multi-angle free-perspective image data is not limited to themanner of stitched image. The pixel data and the depth data inmulti-angle free-perspective image data may be stored in variousmanners, and may also be described by the association relationshipfield.

Similarly, the storage manner may also be indicated in a manner of modenumber. For example, in the storage manner shown in FIG. 23, theassociation relationship field may store the mode number 2. Afterreading the mode number, the device that performs data processing mayparse that the pixel data of the multiple synchronized images are storedsequentially. The device that performs data processing may also parsethe length of the first field and the length of the second field, wherethe depth data of each image is stored in the same storage order as theimage after the storage of multiple first fields is complete. Further,the device that performs data processing may determine the associationrelationship between the pixel data and the depth data of the imageaccording to the association relationship field.

Those skilled in the art may understand that storage manners of thepixel data and the depth data of the multiple synchronized images may bevarious, and expression manners of the association relationship fieldmay also be various. The association relationship field may be indicatedby the above mode number or may directly indicate the content. Thedevice that performs data processing may determine the associationrelationship between the pixel data and the depth data of the imageaccording to the content of the association relationship field withreference to stored data or other priori knowledge such as the contentcorresponding to each mode number or the specific number of the multiplesynchronized images, and the like.

In implementations, generating the multi-angle free-perspective imagedata may further include, calculating and storing parameter data of eachimage based on the multiple synchronized images, and the parameter dataincludes data of the shooting position and the shooting angle of theimage.

With reference to the shooting position and the shooting angle of eachimage of the multiple synchronized images, the device that performs dataprocessing may determine the virtual viewpoint in the same coordinatesystem with reference to the user's needs, and perform thereconstruction of the image based on the multi-angle free-perspectiveimage data, to show the user the expected viewing position andperspective.

In implementations, the parameter data may further include internalparameter data. The internal parameter data includes attribute data ofthe image capturing device. The above data of the shooting position andshooting angle of the image may also be referred to as externalparameter data. The internal parameter data and external parameter datamay be referred to as attitude data. With reference to the internalparameter data and external parameter data, factors indicated byinternal parameter data such as lens distortion may be taken intoaccount during image reconstruction, and the image of the virtualviewpoint may be reconstructed more accurately.

In implementations, generating the multi-angle free-perspective imagedata may further include generating a parameter data storage addressfield, where the parameter data storage address field is used toindicate the storage address of the parameter data. The device thatperforms data processing may obtain the parameter data from the storageaddress of the parameter data.

In implementations, generating the multi-angle free-perspective imagedata may further include generating a data combination storage addressfield, which is used to indicate the storage address of the datacombination, i.e., to indicate the storage addresses of the first fieldand the second field of each image of the multiple synchronized images.The device that performs data processing may obtain the pixel data andthe depth data of the multiple synchronized images from the storagespace corresponding to the storage address of the data combination. Fromthis perspective, the data combination includes the pixel data and thedepth data of the multiple synchronized images.

Those skilled in the art may understand that the multi-anglefree-perspective image data may include specific data such as the pixeldata of the image, the depth data of the image, and parameter data, andthe like, as well as other indicative data such as the above generatedassociation relationship field, and parameter data storage addressfield, data combination storage address field, and the like. Thesepieces of indicative data may be stored in the data header file toinstruct the device that performs data processing to obtain the datacombination, the parameter data, and the like.

In implementations, the terminology explanations, implementationmanners, and beneficial effects involved in respective exampleembodiments of generating multi-angle free-perspective data may refer toother example embodiments. Moreover, various implementations of themulti-angle free-perspective interaction method may be implemented incombination with other example embodiments.

The multi-angle free-perspective data may be multi-anglefree-perspective video data. Hereinafter, a method for generatingmulti-angle free-perspective video data is further described.

Referring to FIG. 37, a method 3700 for generating multi-anglefree-perspective video data may include the following steps:

Step S3702, acquiring multiple frame-synchronized videos, where theshooting angles of the multiple videos are different;

Step S3704, parsing each video to obtain the image combinations atmultiple frame moments, where the image combination includes multipleframe-synchronized frame images;

Step S3706, determining the depth data of each frame image in the imagecombination based on the image combination of each frame moment in themultiple frame moments;

Step S3708, generating a stitched image corresponding to each framemoment, where the stitched image includes a first field storing thepixel data of each frame image in the image combination, and a secondfield storing the depth data of each frame image in the imagecombination;

Step S3710, generating video data based on the multiple stitched images.

In an example embodiment, the capturing device may be the camera.Multiple frame-synchronized videos may be acquired through multiplecameras. Each video includes frame images at multiple frame moments.Multiple image combinations may correspond to different frame momentsrespectively. Each image combination includes multipleframe-synchronized frame images.

In implementations, the depth data of each frame image in the imagecombination is determined based on the image combination at each framemoment in the multiple frame moments.

Following the previous example embodiment, if the frame image in theoriginal video has a resolution of 1080P, i.e., 1920*1080 pixels, with aprogressive scan format, the original depth map may also occupy1920*1080 pixels, which is the single channel. The amount of data ofpixels of the original image is 1920*1080*8*3 bits. The amount of dataof the original depth map is 1920*1080*8 bits. If the number of camerasis 30, the amount of data of pixels of the stitched image is30*(1920*1080*8*3+1920*1080*8) bits, which is about 237M. If notcompressed, the stitched image will occupy a lot of system resources andhave a large delay. Especially when the bandwidth is small, for example,when the bandwidth is 1 Mbps, the uncompressed stitched image needsabout 237 seconds to be transmitted. If the original stitched image istransmitted at the frame rate, real-time video playing is difficult toachieve.

By one or more of the following manners, the amount of data of stitchedimages may be reduced. Through regular storage, a higher compressionratio may be obtained when the video format is compressed.Alternatively, the original image may be reduced in resolution, and thepixel data after resolution reduction may be used as the pixel data ofthe image. Alternatively, down-sampling may be performed on one or moreof the original depth maps. Alternatively, increasing the videocompression bit ratio and other manners may be used.

For example, if the original video, i.e., the obtained multiple videos,the resolution of the frame image is 4K, i.e., the pixel resolution of4096*2160, and the down-sampling has a resolution of 540P, i.e., thepixel resolution of 960*540, the number of pixels of the stitched imageis approximately one-sixteenth of the number of pixels beforedown-sampling. In combination with any one or more of other manners forreducing the amount of data described above, the amount of data may bemade smaller.

Those skilled in the art may understand that if the bandwidth issupportive and the decoding capability of the device that performs dataprocessing may support the stitched image with higher resolution, thestitched image with higher resolution may also be generated to improvethe image quality.

In implementations, generating video data based on the multiple stitchedimages may be generating video data based on all or some of the stitchedimages, which may be determined according to the frame rate of the videoto be generated and the frame rate of the obtained video, or may bedetermined based on the bandwidth of communication with the device thatperforms data processing.

In implementations, generating video data based on multiple the stitchedimages may be encoding and packaging the multiple stitched images in theorder of frame moments to generate the video data.

In an example embodiment, the packaging format may be any one of formatssuch as AVI, Quick Time File Format, MPEG, WMV, Real Video, Flash Video,Matroska, and the like, or other packaging formats. The encoding formatmay be encoding formats of H.261, H.263, H.264, H.265, MPEG, AVS, andthe like, or other encoding formats.

In implementations, generating the multi-angle free-perspective imagedata may further include generating the association relationship field.The association relationship field may indicate the associationrelationship between the first field and at least one second field. Thefirst field stores the pixel data of one of the multiple synchronizedimages. The second field stores the depth data corresponding to theimage. The first field and the second field correspond to the sameshooting angle, i.e., the same perspective.

In implementations, generating the multi-angle free-perspective videodata may further include, calculating and storing parameter data of eachframe image based on the multiple synchronized frame images. Theparameter data includes the data of shooting position and shooting angleof the frame image.

In implementations, multiple frame-synchronized frame images in theimage combinations at different moments in the multiple synchronizedvideos may correspond to the same parameter data. The parameter data maybe calculated with any group of image combinations.

In implementations, generating the multi-angle free-perspective-rangeimage data may further include generating a parameter data storageaddress field, where the parameter data storage address field is used toindicate a storage address of the parameter data. The device thatperforms data processing may obtain the parameter data from the storageaddress of the parameter data.

In implementations, generating the multi-angle free-perspective-rangeimage data may further include generating a video data storage addressfield, where the video image storage address field is used to indicate astorage address of the generated video data.

Those skilled in the art may understand that the multi-anglefree-perspective video data may include generated video data and otherindicative data, such as the above generated association relationshipfield, parameter data storage address field, video data storage addressfield, and the like. These pieces of indicative data may be stored inthe data header file to instruct the device that performs dataprocessing to obtain the video data, the parameter data, and the like.

The terminology explanations, implementation manners, and beneficialeffects involved in respective example embodiments of generatingmulti-angle free-perspective video data may refer to other exampleembodiments. Moreover, various implementations of the multi-anglefree-perspective interaction method may be implemented in combinationwith other example embodiments.

Hereinafter, a method for processing multi-angle free-perspective datais further described.

FIG. 38 is a flowchart of a method 3800 for processing multi-anglefree-perspective data in an example embodiment of the presentdisclosure, which may include the following steps:

Step S3802, acquiring the data header file;

Step S3804, determining the defined format of the data file according tothe parsing result of the data header file;

Step S3806, reading the data combination from the data file based on thedefined format, where the data combination includes the pixel data andthe depth data of the multiple synchronized images, and the multiplesynchronized images have different perspectives with respect theto-be-viewed area, and the pixel data and the depth data of each imageof the multiple synchronized images have an association relationship;

Step S3808, performing image or video reconstruction of the virtualviewpoint according to the read data combination, where the virtualviewpoint is selected from the multi-angle free-perspective range, andthe multi-angle free-perspective range is the range supporting thevirtual viewpoint switching viewing of the to-be-viewed area.

The multi-angle free-perspective data in the example embodiment of thepresent disclosure is the data capable of supporting image or videoreconstruction of the virtual viewpoint within the multi-anglefree-perspective range. The data header file and the data file may beincluded. The data header file may indicate the defined format of thedata file, so that the device that performs data processing on themulti-angle free-perspective data may parse the required data from thedata file according to the data header file. Hereinafter, furtherdescription is provided.

Referring to FIG. 3, the device that performs data processing may be adevice located in the CDN, or the device 33 that performs displaying, ormay be the device that performs data processing. Both the data file andthe data header file may be stored on the server 32 in the cloud.Alternatively, in some application scenarios, the data header file mayalso be stored in the device that performs data processing, and the dataheader file is obtained locally.

In implementations, the stitched image in the above respective exampleembodiments may be used as the data file in the example embodiment ofthe present disclosure. In an application scenario where bandwidth islimited, the stitched image may be split into multiple parts andtransmitted multiple times. Accordingly, the data header file mayinclude the splitting manner. The device that performs data processingmay follow the indications in the data header file to combine the splitmultiple parts to obtain the stitched image.

In implementations, the defined format may include a storage format. Thedata header file may include a field indicating the storage format ofthe data combination. The field may indicate the storage format using anumber. Alternatively, the storage format may be directly written in thefield. Accordingly, the parsing result may be the number of the storageformat, or the storage format.

Accordingly, the device that performs data processing may determine thestorage format according to the parsing result. For example, the storageformat may be determined according to the number and the storedsupporting data. Alternatively, the storage format may also be obtaineddirectly from the field indicating the storage format of the datacombination. In other example embodiments, if the storage format may befixed in advance, the fixed storage format may also be recorded in thedevice that performs data processing.

In implementations, the storage format may be the picture format or thevideo format. As described above, the image format may be any of theimage formats such as BMP, PNG, JPEG, Webp, and the like, or other imageformats. The video format may include the packaging format and encodingformat. The packaging format may be any one of formats such as AVI,QuickTime File Format, MPEG, WMV, Real Video, Flash Video, Matroska, andthe like, or other packaging formats. The encoding format may beencoding formats of H.261, H.263, H.264, H.265, MPEG, AVS, and the like,or other encoding formats.

The storage format may also be a format other than the picture format orthe video format, which is not limited herein. Various storage formatsthat may be indicated by the data header file or the stored supportingdata, such that the device that performs data processing obtains therequired data for subsequent reconstruction of the image or video of thevirtual viewpoint, are all within the protection scope of the presentdisclosure.

In implementations, when the storage format of the data combination isthe video format, the number of data combinations may be multiple. Eachdata combination may be a data combination corresponding to a differentframe moment after decapsulating and decoding the video.

In implementations, the defined format may include the content storagerules of the data combination. The data header file may include a fieldindicating the content storage rules of the data combination. Throughthe content storage rules, the device that performs data processing maydetermine the association relationship between the pixel data and thedepth data in each image. The field indicating the content storage rulesof the data combination may also be referred to as the associationrelationship field. The field may indicate the content storage rules ofthe data combination using a number. Alternatively, the rules may bedirectly written in the field.

Accordingly, the device that performs data processing may determine thecontent storage rules of the data combination according to the parsingresult. For example, content storage rules may be determined accordingto the number and the stored supporting data. Alternatively, the contentstorage rules of the data combination may be obtained directly from thefield indicating the content storage rules of the data combination.

In other example embodiments, if the content storage rules may be fixedin advance, the fixed content storage rules of the data combination mayalso be recorded in the device that performs data processing.Hereinafter, the content storage rules of the data combination, andimplementation for the device that performs data processing to obtainthe data combination with reference to indications of the data headerfile, are further described.

In implementations, the storage rules of the pixel data and the depthdata of the multiple synchronized images may be the storage rules of thepixel data and the depth data of the multiple synchronized images in thestitched image.

As described above, the storage format of the data combination may bethe picture format or the video format. Accordingly, the datacombination may be a picture format or the frame image in the video. Theimage or the frame image stores the pixel data and the depth data ofrespective images of the multiple synchronized images. From thisperspective, the image or frame image obtained through decodingaccording to the picture format or video format may also be referred toas the stitched image. The storage rules of the pixel data and the depthdata of the multiple synchronized images may be storage positions in thestitched image. The storage positions may be various. The variousstorage manners of the pixel data and the depth data of the multiplesynchronized images in the stitched image may refer to the abovedescriptions, and details are not repeated herein.

In implementations, the content storage rules of the data combinationmay be used to indicate to the device that performs data processing thevarious storage manners of the pixel data and the depth data of themultiple synchronized images in the stitched image, or may indicate toeach image the storage manner of the first field and the second field inother storage manners, that is, indicating the storage rules of thepixel data and the depth data of the multiple synchronized images.

As described above, the data header file may include the fieldindicating the content storage rules of the data combination. The fieldmay use a number to indicate the content storage rules of the datacombination. Alternatively, the rules may be written directly in thedata header file. Alternatively, the fixed content storage rules of thedata combination may be recorded in the device that performs dataprocessing.

The content storage rules may correspond to any one of the above storagemanners. The device that performs data processing may parse the storagemanner according to the content storage rules, further parse the datacombination, and determine the association relationship between thepixel data and the depth data of each image of the multiple images.

In implementations, the content storage rules may be indicated by thedistribution of the image area and the depth map area by the storagepositions of the pixel data and the depth data of each image in themultiple synchronized images in the stitched image.

The indication may be a mode number. For example, if the mode number is1, the content storage rules may be parsed as follows, i.e., thestitched image is equally split into two areas up and down, where theupper half area is the image area, and the lower half area is the depthmap area. The image at a certain position in the upper half area isassociated with the depth map stored at the corresponding position inthe lower half area. The device that performs data processing mayfurther determine the storage manner based on the rules. For example,with reference to the number of the multiple synchronized images, thestorage order of the pixel data and the depth data, the proportionalrelationship between the depth data and the pixel data occupying pixelpoints, etc., the device that performs data processing may furtherdetermine whether the storage manner is as shown in FIG. 27 or FIG. 28,or other storage manners.

In implementations, the content storage rules may also be indicated bythe distribution of the image sub-areas and the depth map sub-areas bythe storage positions of the pixel data and the depth data of each imageof the multiple synchronized images in the stitched image. The pixeldata of each image of the multiple synchronized images are stored in theimage sub-areas, and the depth data of each image of the multiplesynchronized images are stored in the depth map sub-areas.

For example, the content storage rules may be that the image sub-areasand the depth map sub-areas are arranged in the interleaving manner.Similar to the previous example, the device that performs dataprocessing may further determine the storage manner based on the rules.For example, with reference to the number of the multiple synchronizedimages, the storage order of the pixel data and the depth data, and theproportional relationship between the depth data and the pixel dataoccupying pixel points, etc., the storage manner may be furtherdetermined as the storage manner shown in FIG. 31, or other storagemanners.

As described above, the first field storing the pixel data and thesecond field storing the depth data may be pixel fields in the stitchedimage, or may be fields that perform storing in other forms. Thoseskilled in the art may understand that the content storage rules may bethe indication suitable for a storage manner, such that the device thatperforms data processing may learn the corresponding storage manner.

In implementations, the content storage rules may further include moreinformation for supporting the device that performs data processing toparse the storage manner of the data combination. For example,information of padding all or some of the above image sub-areas and thedepth map sub-areas and the manner of padding may be included, and themanner of padding may be included. The content storage rules may alsoinclude the resolution relationship between the pixel data and the depthdata of the image.

The device that performs data processing may determine the storagemanner based on the stored information or information obtained fromother fields of the data header file. For example, the above number ofthe multiple synchronized images may also be obtained through the dataheader file, and specifically may be obtained through the defined formatof the data file parsed from the data header file.

After the storage mode is determined, the device that performs dataprocessing may parse the pixel data and the corresponding depth data ofthe multiple synchronized images.

In implementations, the resolutions of the pixel data and the depth datamay be the same, and then the pixel data and the corresponding depthvalues of respective pixel points of each image may be furtherdetermined.

As described above, the depth data may also be the down-sampled data,which may be indicated by corresponding field in the defined format inthe data header file. The device that performs data processing mayperform corresponding up-sampling to determine the pixel data ofrespective pixel point of each image and corresponding depth value.

Accordingly, rendering and displaying according to the read datacombination may be rendering and displaying after performing the imagereconstruction based on determined pixel data of respective pixel pointsof each image and corresponding depth value, and the position of thevirtual viewpoint to be displayed. For video, the reconstructed imagedescribed in the example embodiment of the present disclosure may be theframe images. The frame images are displayed in the order of the framemoments, and the video may be played for the user, to complete the videoreconstruction. That is, the video reconstruction may include thereconstruction of frame images in the video. The implementation mannersof the reconstruction of frame images are the same as or similar to thereconstruction of images.

In implementations, referring to FIG. 39, a method 3900 for performingthe image reconstruction of the virtual viewpoint may include thefollowing steps:

Step S3902, determining parameter data of each image of the multiplesynchronized images, where the parameter data includes data of shootingposition and shooting angle of the images;

Step S3904, determining parameter data of the virtual viewpoint, wherethe parameter data of the virtual viewpoint includes a virtual viewingposition and a virtual viewing perspective;

Step S3906, determining multiple target images among the multiplesynchronized images;

Step S3908, for each target image, projecting the depth data to thevirtual viewpoint according to the relationship between the parameterdata of the virtual viewpoint and the parameter data of the image;

Step S3910, generating a reconstructed image according to the depth dataprojected to the virtual viewpoint and the pixel data of the targetimage.

Generating the reconstructed image may further include, determining thepixel value of each pixel point of the reconstructed image. In anexample embodiment, for each pixel point, if each of the pixel dataprojected to the virtual viewpoint is 0, the pixel data around one ormore target images may be used for inpainting. For each pixel point, ifthe pixel data projected to the virtual viewpoint is multiple non-zerodata, the weight value of respective data may be determined, and thevalues of the pixel points are finally determined.

In an example embodiment of the present disclosure, when generating thereconstructed image, the forward projection may be performed first, andthe depth information is used to project a corresponding group oftexture images in the image combination of the video frame to thethree-dimensional Euclidean space. That is, the depth maps of thecorresponding group are respectively projected to the position of thevirtual viewpoint at the user interaction moment according to thespatial geometric relationship, to form the virtual viewpoint positiondepth map. Then, the backward projection is performed to project thethree-dimensional spatial points onto the imaging plane of the virtualcamera, that is, copying from the pixel points in the texture images ofthe corresponding group to the generated virtual texture imagescorresponding to the position of the virtual viewpoint according to theprojected depth map, to form the virtual texture images corresponding tothe corresponding group. Next, the virtual texture images correspondingto the corresponding group are fused to obtain the reconstructed imageof the position of the virtual viewpoint at the user interaction moment.With the above method for reconstructing the image, the samplingaccuracy of the reconstructed image may be improved.

Before the forward projection is performed, preprocessing may beperformed first. In an example embodiment, according to the parameterdata corresponding to the corresponding group in the image combinationof the video frame, the depth value of forward projection and thehomography matrix of the texture backward projection may be calculatedfirst. In implementations, the Z transformation may be used to convertthe depth level into the depth value.

During the forward projection of the depth map, the formula may be usedto project the depth maps of the corresponding group to the depth mapsof the position of the virtual viewpoint, and then the depth values ofthe corresponding position are copied. In addition, the depth maps ofthe corresponding group may have noise, and some sampled signals may beincluded in the projecting process, so the generated depth maps of theposition of the virtual viewpoint may have small noise holes. Regardingsuch a problem, median filtering may be used to remove the noise.

In implementations, other postprocessing may also be performed on thedepth maps of the position of the virtual viewpoint obtained after theforward projection according to needs, to further improve the quality ofthe generated reconstructed image. In an example embodiment of thepresent disclosure, before the backward projection is performed, thefront and back view occlusion relationship of the depth maps of theposition of the virtual viewpoint obtained by the forward projection isprocessed, so that the generated depth maps may more truly reflect thepositional relationship of objects in the scenario viewed at theposition of the virtual viewpoint.

For the backward projection, for example, the position of thecorresponding group of texture images in the virtual texture images maybe calculated according to the depth maps of the position of the virtualviewpoint obtained by the forward projection. Next, the texture valuescorresponding to the pixel positions are copied, where holes in thedepth maps may be marked as 0 or as no texture value in the virtualtexture images. For the area marked as the hole, the hole expansion maybe performed to avoid synthetic illusion.

Next, the generated virtual texture images of the corresponding groupsare fused to obtain the reconstructed image of the position of thevirtual viewpoint at the user interaction moment. In implementations,the fusion may also be performed in various manners. The following twoexample embodiments are used for illustration.

In an example embodiment of the present disclosure, weighting processingis performed first, and then inpainting is performed. In an exampleembodiment, the weighting processing is performed on pixels incorresponding positions in the virtual texture images corresponding tothe respective corresponding groups in the image combination of videoframes at the time of user interaction, to obtain the pixel values ofcorresponding positions in the reconstructed image of the position ofthe virtual viewpoint at the user interaction moment. Next, for theposition where the pixel value is zero in the reconstructed image at theposition of the virtual viewpoint at the user interaction moment, thepixels around the pixels in the reconstructed image are used to performthe inpainting, to obtain the reconstructed image of viewpoint positionat the user interaction moment.

In another example embodiment of the present disclosure, inpainting isperformed first, and then weighting processing is performed. In anexample embodiment, for the position where the pixel value is zero inthe virtual texture images corresponding to the respective correspondinggroups in the image combination of the video frames at the time of userinteraction, the around pixel values are used respectively to performinpainting. Next, after the inpainting, the weighting processing isperformed on the pixel values in corresponding positions in the virtualtexture images corresponding to the respective corresponding groups, toobtain the reconstructed image of the position of the virtual viewpointat the time of the user interaction.

The weighting processing in the above example embodiment may use theweighted average method, or may use different weighting coefficientsaccording to parameter data or the positional relationship between theshooting device and the virtual viewpoint. In an example embodiment ofthe present disclosure, the weighting is performed according to thereciprocal of the distance between the position of the virtual viewpointand the positions of the respective capturing devices, i.e., the closerthe capturing device to the position of the virtual viewpoint is, thegreater the weight is.

In implementations, the inpainting may be performed with a presetinpainting algorithm according to needs, and details thereof are notdescribed herein again.

In implementations, the data of shooting position and shooting angle ofthe image may be referred to as external parameter data. The parameterdata may further include internal parameter data, i.e., attribute dataof the image shooting device. The distortion parameters and the like maybe reflected by the internal parameter data, and the projectionrelationship may be determined more accurately with reference to theinternal parameters.

In implementations, the parameter data may be obtained from the datafile, and specifically may be obtained from the corresponding storagespace according to the storage address of the parameter data in the dataheader file.

In implementations, the determining of the target image may be selectingmultiple images of which the viewpoints are close to the coordinateposition of the virtual viewpoint based on the 6 degrees of freedomcoordinates of the virtual viewpoint and the 6 degrees of freedomcoordinates of the virtual viewer's viewpoint at the image shootingposition, i.e., 6 degrees of freedom coordinates of the image viewpoint.

In implementations, all images in the multiple synchronized images mayalso be used as the target images. Selecting more images as the targetimage may make the quality of the reconstructed image higher. Theselection of the target image may be determined according to needs, andis not limited herein.

As described above, the depth data may be a set of depth valuescorresponding to the pixels of the image on a one-to-one basis. Thedepth data projected to the virtual viewpoint is also data correspondingto the pixels of the image on a one-to-one basis. To generate thereconstructed image, for example, for each pixel position, according tothe depth data respectively, the corresponding position data is obtainedfrom the pixel data of the target image to generate the reconstructedimage. When the data is obtained from multiple target images for onepixel position, multiple data may be weighted to improve the quality ofthe reconstructed image.

Those skilled in the art may understand that, based on the multi-anglefree-perspective image data in the example embodiment of the presentdisclosure, the process of reconstructing the image of the virtualviewpoint may be various, and is not limited herein.

The terminology explanations, implementation manners, and beneficialeffects involved in the method for processing multi-anglefree-perspective data may refer to other example embodiments. Moreover,various implementations of the multi-angle free-perspective interactionmethod may be implemented in combination with other example embodiments.

The multi-angle free-perspective data described above may be multi-anglefree-perspective image data. Hereinafter, the multi-anglefree-perspective image data processing is described.

FIG. 40 is a flowchart of a multi-angle free-perspective image dataprocessing method 4000 in an example embodiment of the presentdisclosure, and may include the following steps:

Step S4002, acquiring the data combination stored in the picture format,where the data combination includes the pixel data and the depth data ofthe multiple synchronized images, and the multiple synchronized imageshave different perspectives with respect to the to-be-viewed area;

Step S4004, performing image reconstruction of the virtual viewpointbased on the data combination, where the virtual viewpoint is selectedfrom the multi-angle free-perspective range, and the multi-anglefree-perspective range is the range supporting virtual viewpointswitching viewing in the to-be-viewed area.

For the manner of acquiring the data combination in the picture format,the implementation manners in the above example embodiments may be used.The data combination may be obtained by parsing the data header file andreading the data file. The manner of image reconstruction of the virtualviewpoint may also refer to the above description.

In implementations, acquiring the data combination stored in the pictureformat and performing image reconstruction of the virtual viewpoint maybe completed by an edge computing node. As described above, the edgecomputing node may be a node that performs short-range communicationwith the display device that displays the reconstructed image andmaintains a high-bandwidth and low-latency connection, such as theconnection via Wi-Fi, 5G, and the like. In an example embodiment, theedge computing node may be a base station, a mobile device, anin-vehicle device, or a home router with sufficient computing power.Referring to FIG. 3, the edge computing node may be a device located inthe CDN.

Accordingly, before the image reconstruction of the virtual viewpoint isperformed, the parameter data of the virtual viewpoint may also bereceived. After the image reconstruction of the virtual viewpoint isperformed, the reconstructed image may also be sent to the device thatperforms displaying.

Reconstructing the image through an edge computing node may reduce therequirements on the display device. Devices with lower computingcapabilities may also receive the user instruction to provide the userwith the multi-angle free-perspective experience.

For example, in the 5G scenario, the communication speed between theuser equipment (UE) and the base station, especially the base station ofthe current serving cell, is relatively fast. The user may determine theparameter data of the virtual viewpoint by instructing the userequipment. The base station of the current serving cell is used as theedge computing node to calculate the reconstructed image. The devicethat performs displaying may receive the reconstructed image to providethe user with the multi-angle free perspective service.

Those skilled in the art may understand that, in implementations, thedevice that performs image reconstruction and the device that performsdisplaying may also be the same device. The device may receive the userinstruction and determine the virtual viewpoint based on the userinstruction in real time. After the image of the virtual viewpoint isreconstructed, the reconstructed image may be displayed.

In implementations, the implementations of receiving the userinstruction and generating the virtual viewpoint according to the userinstruction may be various, where the virtual viewpoint is a viewpointwithin the free-perspective range. Therefore, in the example embodimentof the present disclosure, the user may be supported to freely switchthe virtual viewpoint within the multi-angle free-perspective range.

Those skilled in the art may understand that the terminologyexplanations, implementation manners, and beneficial effects involved inthe multi-angle free-perspective image data processing method may referto other example embodiments. Moreover, various implementations of themulti-angle free-perspective interactive method may be implemented incombination with other example embodiments.

The multi-angle free-perspective data described above may also bemulti-angle free-perspective video data. Hereinafter, the multi-anglefree-perspective video data processing is described.

FIG. 41 is a flowchart of a method 4100 for processing multi-anglefree-perspective video data in an example embodiment of the presentdisclosure, which may include the following steps:

Step S4102, parsing the acquired video data to obtain data combinationsat different frame moments, where the data combination includes thepixel data and the depth data of the multiple synchronized images, andthe multiple synchronized images have different perspectives withrespect to the to-be-viewed area;

Step S4104, for each frame moment, performing the image reconstructionof the virtual viewpoint based on the data combination, where thevirtual viewpoint is selected from the multi-angle free-perspectiverange, and the multi-angle free-perspective range is the rangesupporting the virtual viewpoint switching viewing of the to-be-viewedarea, and the reconstructed image is used for video playing.

In implementations, the format of the acquired video data may bevarious. The acquired video data may be decapsulated and decoded basedon the video format, to obtain frame images at different frame moments.The data combination may be obtained from the frame image. That is, theframe image may store the pixel data and the depth data of the multiplesynchronized images. From this perspective, the frame image may also bereferred to as the stitched image.

The video data may be obtained from the data file according to the dataheader file. The implementation manner of acquiring the data combinationmay refer to the above description. For the implementation manner ofimage reconstruction of the virtual viewpoint may also refer to theabove description. After the reconstructed image at each frame moment isobtained, the video may be played according to the order of the framemoments.

In implementations, acquiring data combinations at different framemoments and performing image reconstruction of the virtual viewpoint maybe completed by the edge computing node.

Accordingly, before the image reconstruction of the virtual viewpoint isperformed, the parameter data of the virtual viewpoint may also bereceived. After the image reconstruction of the virtual viewpoint isperformed, the reconstructed images at respective frame moments may besent to the device that performs displaying.

Those skilled in the art may understand that, in implementations, thedevice that performs image reconstruction and the device that performsdisplaying may also be the same device.

Those skilled in the art may understand that the terminologyexplanations, implementation manners, and beneficial effects involved inthe method for processing multi-angle free-perspective video data mayrefer to other example embodiments. Moreover, various implementations ofthe multi-angle free-perspective interactive method may be implementedin combination with other example embodiments.

Hereinafter, the multi-angle free-perspective interaction method isfurther described.

FIG. 42 is a flowchart of a multi-angle free-perspective interactionmethod 4200 in an example embodiment of the present disclosure, whichmay include the following steps:

Step S4202, receiving a user instruction;

Step S4204, determining the virtual viewpoint according to the userinstruction, where the virtual viewpoint is selected from a multi-anglefree-perspective range, and the multi-angle free-perspective range isthe range supporting the virtual viewpoint switching viewing of theto-be-viewed area;

Step S4206, displaying the display content for viewing the to-be-viewedarea based on the virtual viewpoint, where the display content isgenerated based on the data combination and the virtual viewpoint, andthe data combination includes the pixel data and the depth data of themultiple synchronized images, and there is an association relationshipbetween the pixel data and the depth data of each image, and themultiple synchronized images have different perspectives with respect tothe to-be-viewed area.

In the example embodiment of the present disclosure, the virtualviewpoint may be a viewpoint within the multi-angle free-perspectiverange. The specific multi-angle perspective range may be associated withthe data combination.

In implementations, the user instruction may be received, and thevirtual viewpoint may be determined within the free-perspective rangeaccording to the user instruction. The user instruction and the mannerof determining the virtual viewpoint according to the user instructionmay be various. Hereinafter, further illustrations are described.

In implementations, determining the virtual viewpoint according to theuser instruction may include, determining the basic viewpoint forviewing the to-be-viewed area, where the basic viewpoint includes theposition and the perspective of the basic viewpoint. At least one of theposition and the perspective of the virtual viewpoint may be changedbased on the basic viewpoint. There is an association relationshipbetween the user instruction and the changing manner of the change.Under the user instruction, the virtual viewpoint is determinedaccording to the user instruction, the basic viewpoint, and the aboveassociation relationship, with the basic viewpoint as the base.

The basic viewpoint may include the position and the perspective of theto-be-viewed area of the user. Further, the basic viewpoint may be theposition and the perspective corresponding to the picture displayed bythe device that performs displaying when the user instruction isreceived. For example, referring to FIG. 4, if the image displayed bythe device is as shown in FIG. 4 when the user instruction is received,referring to FIG. 2, the position of the basic viewpoint may be VP₁ asshown in FIG. 2. Those skilled in the art may understand that theposition and the perspective of the basic viewpoint may be preset.Alternatively, the basic viewpoint may also be the virtual viewpointdetermined according to the user instruction in advance. The basicviewpoint may also be expressed with 6DoF coordinates. The associationrelationship between the user instruction and the change of the virtualviewpoint based on the basic viewpoint may be a preset associationrelationship.

In implementations, various manners of receiving the user instructionmay exist, which are described respectively hereinafter.

In implementations, a path of touchpoint on the touch-sensitive screenmay be detected. The path may include a starting point, an ending point,and a moving direction of the touchpoint. The path is used as the userinstruction.

Accordingly, the association relationship between the path and thechanging manner of the virtual viewpoint based on the basic viewpointmay also be various.

For example, there may be two paths, where the touchpoint of at leastone of the two paths moves in a direction away from the othertouchpoint, and then the position of the virtual viewpoint moves in adirection close to the to-be-viewed area.

Referring to FIG. 43 and FIG. 11, the vector F₁ and the vector F₂ inFIG. 43 may respectively illustrate two paths. Under this path, if thebasic viewpoint is B₂ in FIG. 11, the virtual viewpoint may be B₃. Thatis, for the user, the to-be-viewed area is zoomed in.

Those skilled in the art may understand that FIG. 43 is merely forillustration. In specific application scenarios, the starting points,the ending points, and the directions of the two paths may be various,as long as the touchpoint of at least one of the two paths moves in adirection away from the other touchpoint. One of the two paths may be apath of the touchpoint that does not move, and only includes thestarting point.

In an example embodiment of the present disclosure, the display imagebefore zooming in may be as shown in FIG. 4, and the image after zoomingin may be as shown in FIG. 44.

In implementations, the center point of zooming in may be determinedaccording to the position of the touchpoint. Alternatively, with apreset point as the center point, the image may be zoomed in with thecenter point. The rate of zooming in, i.e., the magnitude of the virtualviewpoint movement, may be associated with the magnitude of thetouchpoints in the two paths close to each other. The associationrelationship may be preset.

In implementations, if the touchpoint of at least one of the two pathsmoves in a direction close to the other touchpoint, the position of thevirtual viewpoint may move in a direction away from the to-be-viewedarea.

Referring to FIG. 45 and FIG. 11, the vector F₃ and the vector F₄ inFIG. 45 may respectively illustrate two paths. Under this path, if thebasic viewpoint is B₃ in FIG. 11, the virtual viewpoint may be B₂. Thatis, for a user, the to-be-viewed area is zoomed out.

Those skilled in the art may understand that FIG. 45 is merely forillustration. In specific application scenarios, the starting points,the ending points, and the directions of the two paths may be various,as long as the touchpoint of at least one of the two paths moves in adirection close to the other touchpoint. One of the two paths may be apath of the touchpoint that does not move, and only includes thestarting point.

In an example embodiment of the present disclosure, the display imagebefore zooming out may be as shown in FIG. 44, and the image afterzooming out may be as shown in FIG. 4.

In implementations, the center point of zooming out may be determinedaccording to the position of the touchpoint. Alternatively, with apreset point as the center point, the image may be zoomed out with thecenter point. The rate of zooming out, i.e., the magnitude of thevirtual viewpoint movement, may be associated with the magnitude of thetouchpoints in the two paths close to each other. The associationrelationship may be preset.

In implementations, the association relationship between changing mannerof the path and the changing manner of the virtual viewpoint based onthe basic viewpoint may also include the following: there is one path,and the moving distance of the touchpoint is associated with the changemagnitude of the perspective. The moving direction of the touchpoint isassociated with the direction of change of the perspective.

For example, with reference to FIG. 5 and FIG. 13, if the received userinstruction is one path, the vector D₅₂ in FIG. 5 is used forillustration. If the basic viewpoint is the point C₂ in FIG. 13, thevirtual viewpoint may be the point C₁.

In an example embodiment of the present disclosure, the display beforethe perspective switching may refer to FIG. 5. The display of thedisplay device after the perspective switching may be as shown in FIG.6.

If the received user instruction is one path, for example, asillustrated by the vector D₈₁ in FIG. 8. If the basic viewpoint is thepoint C₂ in FIG. 13, the virtual viewpoint may be the point C₃.

In an example embodiment of the present disclosure, the display beforethe perspective switching may refer to FIG. 8. The display of thedisplay device after the perspective switching may be as shown in FIG.9.

It may be understood by those skilled in the art that the above exampleembodiments are merely qualitative illustrations, and do not limit theassociation between the user instruction and the virtual viewpoint.

In implementations, the user instruction may include a voice controlinstruction. The voice control instruction may be in a format of naturallanguage, such as “zoom in”, “re-zoom in”, “leftward perspective”, andthe like. Accordingly, the virtual viewpoint may be determined accordingto the user instruction. The voice recognition may be performed on theuser instruction. The virtual viewpoint may be determined according tothe preset association relationship between the instruction and thechanging manner of the virtual viewpoint based on the basic viewpointwith the basic viewpoint as the base.

In implementations, the user instruction may also include the selectionof the preset viewpoint for viewing the to-be-viewed area. Depending ondifferent to-be-viewed areas, the preset viewpoints may be various. Thepreset viewpoint may include the position and the perspective. Forexample, if the to-be-viewed area is the basketball game area, theposition of the preset viewpoint may be set under the backboard, suchthat when the user is viewing, the user has the perspective of theaudience on the sideline, or the perspective of the coach. Accordingly,the preset viewpoint may be used as the virtual viewpoint.

In implementations, the user instruction may further include theselection of an object in the to-be-viewed area. The object may bedetermined through image recognition technology. For example, in thebasketball game, respective players in the game scenario may beidentified according to face recognition technology. The user isprovided with options for relevant players. According to the user'sselection of the player, the virtual viewpoint may be determined, andthe picture under the virtual viewpoint is provided to the user.

In implementations, the user instruction may further include at leastone of the position and the perspective of the virtual viewpoint. Forexample, 6DoF coordinates of the virtual viewpoint may be directlyinput.

In implementations, various manners of receiving the user instructionmay exist. For example, the various manners may be detecting the signalof the touchpoint on the touch-sensitive screen, detecting the signal ofthe acoustic and electrical sensor, detecting signals of sensors thatcan reflect the attitude of the device such as the gyroscope, thegravity sensor, and the like. The corresponding user instruction may bethe path of the touchpoint on the touch-sensitive screen, the voicecontrol instruction, the gesture operation, etc. The content instructedby the user may also be various, for example, various manners ofindicating the changing manner of the virtual viewpoint based on thebasic viewpoint, indicating the preset viewpoint, indicating the viewingobject, or directly indicating at least one of the position and theperspective of the virtual viewpoint. Implementation manners ofdetermining the virtual viewpoint according to a user instruction mayalso be various.

In an example embodiment, with reference to the above manner ofreceiving the user instruction, the detection of the above varioussensing devices may be performed at a preset time interval. The timeinterval corresponds to the frequency of detection. For example, thedetection may be performed at a frequency of 25 times per second toobtain the user instruction.

Those skilled in the art may understand that the manner of receiving theuser instruction, the content of the user instruction, and the manner ofdetermining the virtual viewpoint according to the user instruction maybe combined or replaced, which is not limited herein.

In implementations, after a trigger instruction is received, the userinstruction may also be received in response to the trigger instruction,so that the user's maloperation may be avoided. The trigger instructionmay be a click on a preset button in the screen area. Alternatively, avoice control signal may be used as the trigger instruction.Alternatively, above manners that can be used by the user instruction orother manners may be used.

In implementations, the user instruction may be received during theprocess of playing the video or displaying the image. When the userinstructions are received during the process of displaying the image,the data combination may be the data combination corresponding to theimage. When the user instruction is received during the process ofplaying the video, the data combination may be the data combinationcorresponding to the frame image in the video. The display content forviewing the to-be-viewed area based on the virtual viewpoint may be theimage reconstructed based on the virtual viewpoint.

During the process of playing the video, after the user instruction ofgenerating the virtual viewpoint is received, the display content forviewing the to-be-viewed area based on the virtual viewpoint may bemultiple reconstructed frame images generated based on the virtualviewpoint. That is, during the process of switching the virtualviewpoint, the video may be continuously played. Before the virtualviewpoint is re-determined according to the user instruction, the videomay be played with the original virtual viewpoint. After the virtualviewpoint is re-determined, the reconstructed frame images based on thevirtual viewpoint may be generated and played at the position andperspective of the switched virtual viewpoint.

Further, during the process of playing the video, after the userinstruction of generating the virtual viewpoint is received, the displaycontent for viewing the to-be-viewed area based on the virtual viewpointmay be multiple reconstructed frame images based on the virtualviewpoint. That is, during the process of switching the virtualviewpoint, the video may be continuously played. Before the virtualviewpoint is determined, the video may be played in the originalconfiguration. After the virtual viewpoint is determined, thereconstructed frame image based on the virtual viewpoint may begenerated and played with the position and the perspective of theswitched viewpoint. Alternatively, the video playing may be paused toswitch the virtual viewpoint.

Referring to FIG. 4 and FIG. 6, during the process of image displaying,the user instruction may be received. The virtual viewpoint may begenerated according to the user instruction to switch the view. Thedisplay content may be switched from the image as shown in FIG. 4 to theimage as shown in FIG. 6.

When the video is played to the frame image as shown in FIG. 4, thevirtual viewpoint is switched, and the frame image as shown in FIG. 6 isdisplayed. Before a new user instruction is received, the frame imagebased on the virtual viewpoint may be continuously displayed for videoplaying. For example, when the frame image as shown in FIG. 46 isplayed, the new user instruction is received, and the virtual viewpointmay be switched according to the user instruction to continue the videoplaying.

Those skilled in the art may understand that the terminologyexplanations, implementation manners, and beneficial effects involved inthe multi-angle free-perspective interaction method may refer to otherexample embodiments. Moreover, various implementations of themulti-angle free-perspective interaction method may be implemented incombination with other example embodiments.

Hereinafter, another method for multi-angle free-perspective interactionis further described.

FIG. 47 is a flowchart of a video generating method 4700 in an exampleembodiment of the present disclosure, which may include the followingsteps:

Step S4702, acquiring a virtual viewpoint trajectory in response to arecording instruction, where the virtual viewpoint trajectory is a setof virtual viewpoints arranged according to a chronological order, andeach virtual viewpoint of the set of virtual viewpoints is selected froma multi-angle free-perspective range, and the multi-anglefree-perspective range is a range supporting viewpoint switching viewingin the to-be-viewed area;

Step S4704, recording a rendered picture under the viewpoint trajectory,where the rendered picture under the viewpoint trajectory is a picturefor viewing the to-be-viewed area according to the virtual viewpointtrajectory.

In implementations, the trajectory of the virtual viewpoint may bevarious. For example, referring to FIG. 13, the user may continuouslyslide on the touch-sensitive screen to indicate the change of theperspective. The virtual viewpoint trajectory may be multiple positionson the arc from C₂ to C₁ along the sphere. The number of specificpositions may be related to the speed of the user's sliding. In thisscenario, the user experience may be that the perspective continuouslychanges when viewing the to-be-viewed area. Taking FIG. 5 and FIG. 6 asexamples, at the perspective of the user, the viewed picture may becontinuously switched from the scenario as shown in FIG. 5 to thescenario as shown in FIG. 6.

Alternatively, the user may directly instruct the position of thevirtual viewpoint in the manners of the above example embodiments. Forexample, the position of the virtual viewpoint before the instruction islocated at C₂, and the position of the virtual viewpoint after theinstruction is C₁. Then, in the virtual viewpoint trajectory, thepositions of two continuous virtual viewpoints may be positions C₂ andC₁. In this scenario, the user experience may be similar to theexperience brought by switching the camera position in cinematographyworks. Still taking FIG. 5 and FIG. 6 as examples, at the viewing angleof the user, the viewed picture may be directly switched from thescenario as shown in FIG. 5 to the scenario as shown in FIG. 6.

In implementations, the user instruction may be received during theprocess of playing a video or displaying an image. Those skilled in theart may understand that prior to or after receiving the recordinginstruction, a user instruction may be received. After the recordinginstruction is received, the images displayed on the screen arerecorded.

Those skilled in the art may understand that the images displayed on thescreen are in response to the user instructions. That is, images forviewing the to-be-viewed area based on the virtual viewpoint aredisplayed in a chronological order of the user instructions.

In implementations, the above images for viewing the to-be-viewed areabased on the virtual viewpoint may be obtained after imagereconstruction based on a data combination and the virtual viewpoint.The data combination may include pixel data and depth data of multiplesynchronized images. There is an association relationship between theimage data and depth data of each image. The multiple synchronizedimages have different perspectives with respect to the to-be-viewedarea. There are various manners of acquiring the data combination,performing the image reconstruction based on the data combination andthe virtual viewpoint, and implementations for generating thereconstructed image, and details are as described above.

During the process of displaying an image, the user instruction isreceived, and the data combination may be a data combinationcorresponding to the image. During the process of playing a video, theuser instruction is received, and the data combination may be a datacombination corresponding to a frame image in the video. The displaycontent for viewing the to-be-viewed area based on the virtual viewpointmay be a reconstructed image based on the virtual viewpoint.

During the process of playing the video, when the virtual viewpoint isgenerated after the user instruction is received, the display contentfor viewing the to-be-viewed area based on the virtual viewpoint may bemultiple reconstructed frame images generated based on the virtualviewpoint. image. That is, during the process of switching the virtualviewpoint, the video may be continuously played. Before the virtualviewpoint is re-determined according to the user instruction, the videomay be played with the original virtual viewpoint. After the virtualviewpoint is re-determined, the reconstructed frame images based on thevirtual viewpoint may be generated and played at the position andperspective of the switched virtual viewpoint.

With reference to FIGS. 5 and 6, during the process of displaying theimage, the user instruction may be received. The virtual viewpoint maybe generated to switch the viewing viewpoint according to the userinstruction. The display content may be changed from the image as shownin FIG. 5 to the image as shown in FIG. 6.

During the process of playing the video, when the frame image as shownin FIG. 6 is played, the virtual viewpoint may be switched according tothe user instruction to display the frame image as shown in FIG. 4.Until a new user instruction is received, frame images based on thevirtual viewpoint may be continuously displayed, such as the frame imageas shown in FIG. 46.

In implementations, during the process of playing the video, the playingof the video may also be paused, and any one or more of the position orthe perspective of the virtual viewpoint of the frame image at aspecified frame moment may be switched.

By selecting the frame moment, the user may determine the picture ofinterest to perform viewpoint switching within the multi-anglefree-perspective range. The user may select a picture that has beenplayed. Alternatively, in the scenario of recording and playback, theuser may also select the subsequent picture yet to be played. Theselection is more flexible, and the user experience is better.

With reference to FIG. 4, FIG. 6, and FIG. 46, as an example, when theuser is viewing a video, the user may see the shooting pictures as shownin FIG. 6 to FIG. 46. The user may instruct in multiple ways and selectthe frame moment to switch. For example, the user selects the picture asshown in FIG. 6 to change the perspective. For example, the user mayswitch the viewpoint and view the picture as shown in FIG. 4 afterswitching.

In implementations, recording the rendered picture under the virtualviewpoint trajectory may be acquiring the frame image displayed at eachframe moment according to a display frame rate. That is, the frame imagedisplayed to the user every frame may be stored. Next, the frame imagedisplayed at each frame moment may be compressed in the video format.

Example embodiments of the present disclosure further provide a videogenerating apparatus 4800. Referring to FIG. 48. As shown in FIG. 48,the apparatus 4800 may include one or more processors 4802, aninput/output module 4804, a communication module 4806, and a memory4808. The input/output module 4804 is configured to receive data/signalto be processed and to output the processed data/signal. Thecommunication module 4806 is configured to allow the apparatus 4800 tocommunicate with other devices (not shown) over a network (not shown).The memory 4808 stores thereon computer-executable modules executable bythe one or more processors 4802. The computer-executable modules mayinclude the following:

A virtual viewpoint trajectory acquiring unit 4810, adapted to acquire avirtual viewpoint trajectory in response to a recording instruction,where the virtual viewpoint trajectory is a set of virtual viewpointsarranged according to a chronological order, and each virtual viewpointof the set of virtual viewpoints is selected from a multi-anglefree-perspective range, and the multi-angle free-perspective range is arange supporting viewpoint switching viewing in the to-be-viewed area;

A rendered picture recording unit 4812, adapted to record a renderedpicture under the viewpoint trajectory, where the rendered picture underthe viewpoint trajectory is a picture for viewing the to-be-viewed areaaccording to the virtual viewpoint trajectory.

In implementations, the video generating apparatus may further include acompressing unit 4814, which is adapted to compress a frame imagedisplayed at each frame moment in a video format.

Referring to FIG. 49, in implementations, the virtual viewpointtrajectory acquiring unit 4810 may include:

An instruction receiving subunit 4902, adapted to receive userinstructions, and determine the virtual viewpoints according to the userinstructions;

A virtual viewpoint trajectory generating unit 4904, adapted to arrangethe virtual viewpoints in a chronological order of receiving the userinstructions, to generate the virtual viewpoint trajectory.

Referring to FIG. 50, in implementations, the instruction receivingsubunit 4902 may include:

A basic viewpoint determining module 5002, adapted to determine a basicviewpoint for viewing the to-be-viewed area, where the basic viewpointincludes a position and a perspective of the basic viewpoint;

A virtual viewpoint determining module 5004, adapted to determine thevirtual viewpoint based on the user instruction and an associationrelationship between the user instruction and a changing manner of thevirtual viewpoint based on the basic viewpoint, with the basic viewpointas a reference.

With continued reference to FIG. 49, in implementations, the instructionreceiving subunit 4902 is further adapted to detect a path of atouchpoint on a touch-sensitive screen, where the path includes at leastone of a start point, an end point, and a moving direction of thetouchpoint, with the path as the user instruction.

In implementations, the user instruction may include a selection of aspecific object in the to-be-viewed area by the user. With continuedreference to FIG. 48, the video generating apparatus may furtherinclude:

A specific object determining unit 4816, adapted to determine thespecific object in the to-be-viewed area through an image recognitiontechnology prior to receiving the user instruction;

A selection option providing unit 4818, adapted to provide a selectionoption of the specific object.

The terminology explanations, principles, implementation manners, andbeneficial effects involved in the video generating apparatus in theexample embodiments of the present disclosure may refer to the videogenerating interactive method in the example embodiments of the presentdisclosure, and details are not repeated herein.

Example embodiments of the present disclosure further provide acomputer-readable storage medium having computer instructions storedthereon, and when the computer instructions are executed, the steps thevideo generating method are performed.

The computer-readable storage medium may be various suitable media, suchas an optical disc, a mechanical hard disk, and a solid-state hard disk.The computer-readable storage medium may include a volatile ornon-volatile type, a removable or non-removable media, which may achievestorage of information using any method or technology. The informationmay include a computer-readable instruction, a data structure, a programmodule or other data. Examples of computer storage media include, butnot limited to, phase-change memory (PRAM), static random access memory(SRAM), dynamic random access memory (DRAM), other types ofrandom-access memory (RAM), read-only memory (ROM), electronicallyerasable programmable read-only memory (EEPROM), quick flash memory orother internal storage technology, compact disk read-only memory(CD-ROM), digital versatile disc (DVD) or other optical storage,magnetic cassette tape, magnetic disk storage or other magnetic storagedevices, or any other non-transmission media, which may be used to storeinformation that may be accessed by a computing device. As definedherein, the computer-readable storage medium does not include transitorymedia, such as modulated data signals and carrier waves.

Example embodiments of the present disclosure further provide aterminal, including a memory and a processor. The memory stores computerinstructions thereon capable of running on the processor, and when thecomputer instructions are executed by the processor, the steps of thevideo generating method are performed.

The terminal may be various suitable terminals such as a smart phone, atablet computer, and the like.

Although the present disclosure has been described as above, the presentdisclosure is not limited thereto. Any person skilled in the art maymake various changes and modifications without departing from the spiritand scope of the present disclosure. Therefore, the protection scope ofthe present disclosure shall be subject to the scope defined by theclaims.

Example Clauses

Clause 1. A video generating method, comprising: acquiring a virtualviewpoint trajectory in response to a recording instruction, wherein thevirtual viewpoint trajectory is a set of virtual viewpoints arrangedaccording to a chronological order, and the viewpoint is selected from amulti-angle free-perspective range, and the multi-angle free-perspectiverange is a range supporting viewpoint switching viewing in theto-be-viewed area; and recording a rendered picture under the viewpointtrajectory, wherein the rendered picture under the viewpoint trajectoryis a picture for viewing the to-be-viewed area according to the virtualviewpoint trajectory.

Clause 2. The video generating method according to clause 1, wherein thepicture for viewing the to-be-viewed area according to the virtualviewpoint trajectory includes images for viewing the to-be-viewed areabased on the virtual viewpoint displayed according to the chronologicalorder, wherein the images are generated based on a data combination andthe virtual viewpoint, and the data combination includes pixel data anddepth data of multiple synchronized images, and an associatedrelationship exists between image data and depth data of each image, andthe multiple synchronized images have different perspectives withrespect to the to-be-viewed area.

Clause 3. The video generating method according to clause 2, wherein theimages for viewing the to-be-viewed area based on the virtual viewpointinclude multiple frame images for viewing the to-be-viewed area based onthe virtual viewpoint.

Clause 4. The video generating method according to clause 1, whereinrecording the rendered picture under the viewpoint trajectory comprises:acquiring a frame image displayed at each frame moment according to adisplay frame rate.

Clause 5. The video generating method according to clause 3, furthercomprising: compressing a frame image displayed at each frame moment ina video format.

Clause 6. The video generating method according to clause 1, whereinacquiring the virtual viewpoint trajectory comprises: receiving userinstructions, and determining the virtual viewpoints according to theuser instructions; and arranging the virtual viewpoints in achronological order of receiving the user instructions, to generate thevirtual viewpoint trajectory.

Clause 7. The video generating method according to clause 6, whereindetermining the virtual viewpoints according to the user instructionscomprises: determining a basic viewpoint for viewing the to-be-viewedarea, wherein the basic viewpoint includes a position and a perspectiveof the basic viewpoint; and determining the virtual viewpoints based onthe user instructions and an association relationship between a userinstruction and a changing manner of the virtual viewpoint based on thebasic viewpoint, with the basic viewpoint as a reference.

Clause 8. The video generating method according to clause 7, whereinreceiving user instructions comprises: detecting a path of a touchpointon a touch-sensitive screen, wherein the path includes at least one of astart point, an end point, and a moving direction of the touchpoint,with the path as the user instruction.

Clause 9. The video generating method according to clause 8, wherein theassociation relationship between the path and the changing manner of thevirtual viewpoint based on the basic viewpoint comprises: the number ofpaths is two, wherein a touchpoint of at least one path in the two pathsmoves in a direction away from the other party, and a position of thevirtual viewpoint moves in a direction close to the to-be-viewed area.

Clause 10. The video generating method according to clause 8, whereinthe association relationship between the path and the changing manner ofthe virtual viewpoint based on the basic viewpoint comprises: the numberof the paths is two, wherein a touchpoint of at least one of the twopaths moves in a direction close to the other party, and a position ofthe virtual viewpoint moves in a direction away from the to-be-viewedarea.

Clause 11. The video generating method according to clause 8, whereinthe association relationship between the path and the changing manner ofthe virtual viewpoint based on the basic viewpoint comprises: the numberof the path is one, wherein a moving distance of the touchpoint isassociated with a magnitude of change of a perspective, and a movingdirection of the touchpoint is associated with a direction of change ofthe perspective.

Clause 12. The video generating method according to clause 6, whereinthe user instructions include a voice control instruction.

Clause 13. The video generating method according to clause 6, whereinthe user instructions include a selection of a preset viewpoint forviewing the to-be-viewed area.

Clause 14. The video generating method according to clause 13, whereinthe preset viewpoint is taken as the virtual viewpoint.

Clause 15. The video generating method according to clause 6, whereinthe user instructions include a selection of a specific object in theto-be-viewed area by the user.

Clause 16. The video generating method according to clause 15, prior toreceiving the user instructions, the method further comprising:determining the specific object in the to-be-viewed area through animage recognition technology; and providing a selection option of thespecific object.

Clause 17. The video generating method according to clause 6, whereinthe user instructions include at least one of a position and aperspective of the virtual viewpoint.

Clause 18. The video generating method according to clause 6, whereinthe user instructions include a voice control instruction.

Clause 19. The video generating method according to clause 6, whereinthe user instructions include attitude change information from at leastone of a gyroscope and a gravity sensor.

Clause 20. The video generating method according to clause 1, the userinstructions are received during a process of playing a video ordisplaying an image.

Clause 21. A video generating apparatus, comprising: a virtual viewpointtrajectory acquiring unit, configured to acquire a virtual viewpointtrajectory in response to a recording instruction, wherein the virtualviewpoint trajectory is a set of virtual viewpoints arranged accordingto a chronological order, and the viewpoint is selected from amulti-angle free-perspective range, and the multi-angle free-perspectiverange is a range supporting viewpoint switching viewing in theto-be-viewed area; and a rendered picture recording unit, configured torecord a rendered picture under the viewpoint trajectory, wherein therendered picture under the viewpoint trajectory is a picture for viewingthe to-be-viewed area according to the virtual viewpoint trajectory.

Clause 22. A computer-readable storage medium having computerinstructions stored thereon, wherein when the computer instructions areexecuted, the steps of the video generating method according to any oneof clauses 1 to 20 are performed.

Clause 23. A terminal comprising a memory and a processor, the memorystoring computer instructions thereon capable of running on theprocessor, wherein when the processor executes the computerinstructions, the steps of the video generating method according to anyone of clauses 1 to 20 are performed.

What is claimed is:
 1. A method, comprising: acquiring a virtualviewpoint trajectory in response to a recording instruction, wherein thevirtual viewpoint trajectory is a set of virtual viewpoints arrangedaccording to a chronological order, and each virtual viewpoint of theset of virtual viewpoints is selected from a multi-anglefree-perspective range, and the multi-angle free-perspective range is arange supporting virtual viewpoint switching viewing in the to-be-viewedarea; and recording a rendered picture under the virtual viewpointtrajectory, wherein the rendered picture under the virtual viewpointtrajectory is a picture for viewing the to-be-viewed area according tothe virtual viewpoint trajectory.
 2. The method of claim 1, wherein thepicture for viewing the to-be-viewed area according to the virtualviewpoint trajectory includes images for viewing the to-be-viewed areabased on the virtual viewpoint displayed according to the chronologicalorder, wherein the images are generated based on a data combination andthe virtual viewpoint, and the data combination includes pixel data anddepth data of multiple synchronized images, and an associatedrelationship exists between image data and depth data of a respectiveimage of the multiple synchronized images, and the multiple synchronizedimages have different perspectives with respect to the to-be-viewedarea.
 3. The method of claim 2, wherein the images for viewing theto-be-viewed area based on the virtual viewpoint include multiple frameimages for viewing the to-be-viewed area based on the virtual viewpoint.4. The method of claim 1, wherein recording the rendered picture underthe virtual viewpoint trajectory comprises: acquiring a respective frameimage displayed at a respective frame moment according to a displayframe rate.
 5. The method of claim 3, further comprising: compressing arespective frame image displayed at a respective frame moment in a videoformat.
 6. The method of claim 1, wherein acquiring the virtualviewpoint trajectory comprises: receiving user instructions, anddetermining the set of virtual viewpoints according to the userinstructions; and arranging the set of virtual viewpoints in achronological order of receiving the user instructions, to generate thevirtual viewpoint trajectory.
 7. The method of claim 6, whereindetermining the set of virtual viewpoints according to the userinstructions comprises: determining a basic viewpoint for viewing theto-be-viewed area, wherein the basic viewpoint includes a position and aperspective of the basic viewpoint; and determining the set of virtualviewpoints based on the user instructions and an associationrelationship between a user instruction and a changing manner of thevirtual viewpoint based on the basic viewpoint, with the basic viewpointas a reference.
 8. The method of claim 7, wherein receiving userinstructions comprises: detecting a path of a touchpoint on atouch-sensitive screen, wherein the path includes at least one of astart point, an end point, and a moving direction of the touchpoint,with the path as the user instruction.
 9. The method of claim 8, whereinthe association relationship between the user instruction and thechanging manner of the virtual viewpoint based on the basic viewpointcomprises: the number of paths is two, wherein a first touchpoint of afirst path in the two paths moves in a direction away from a secondtouchpoint of a second path, and a position of the virtual viewpointmoves in a direction close to the to-be-viewed area.
 10. The method ofclaim 8, wherein the association relationship between the userinstruction and the changing manner of the virtual viewpoint based onthe basic viewpoint comprises: the number of the paths is two, wherein afirst touchpoint of a first path of the two paths moves in a directionclose to a second touchpoint of a second path, and a position of thevirtual viewpoint moves in a direction away from the to-be-viewed area.11. The method of claim 8, wherein the association relationship betweenthe user instruction and the changing manner of the virtual viewpointbased on the basic viewpoint comprises: the number of the path is one,wherein a moving distance of the touchpoint is associated with amagnitude of change of a perspective, and a moving direction of thetouchpoint is associated with a direction of change of the perspective.12. The method of claim 6, wherein the user instructions include a voicecontrol instruction.
 13. The method of claim 6, wherein the userinstructions include a selection of a preset viewpoint for viewing theto-be-viewed area.
 14. The method of claim 13, wherein the presetviewpoint is taken as the virtual viewpoint.
 15. The method of claim 6,wherein the user instructions include a selection of a specific objectin the to-be-viewed area by the user.
 16. The method of claim 15, priorto receiving the user instructions, the method further comprising:determining the specific object in the to-be-viewed area through animage recognition technology; and providing a selection option of thespecific object.
 17. The method of claim 6, wherein the userinstructions include at least one of a position and a perspective of thevirtual viewpoint.
 18. The method of claim 6, wherein the userinstructions include attitude change information from at least one of agyroscope and a gravity sensor.
 19. An apparatus, comprising: one ormore processors; and memory, coupled to the one or more processors, thememory storing thereon computer-executable modules, executable by theone or more processors, the computer-executable modules including: avirtual viewpoint trajectory acquiring unit, configured to acquire avirtual viewpoint trajectory in response to a recording instruction,wherein the virtual viewpoint trajectory is a set of virtual viewpointsarranged according to a chronological order, and each virtual viewpointof the set of virtual viewpoints is selected from a multi-anglefree-perspective range, and the multi-angle free-perspective range is arange supporting virtual viewpoint switching viewing in the to-be-viewedarea; and a rendered picture recording unit, configured to record arendered picture under the virtual viewpoint trajectory, wherein therendered picture under the viewpoint trajectory is a picture for viewingthe to-be-viewed area according to the virtual viewpoint trajectory. 20.A computer-readable storage medium having computer instructions storedthereon that, when executed by one or more processors, cause the one ormore processors to perform acts comprising: acquiring a virtualviewpoint trajectory in response to a recording instruction, wherein thevirtual viewpoint trajectory is a set of virtual viewpoints arrangedaccording to a chronological order, and each virtual viewpoint of theset of virtual viewpoints is selected from a multi-anglefree-perspective range, and the multi-angle free-perspective range is arange supporting virtual viewpoint switching viewing in the to-be-viewedarea; and recording a rendered picture under the virtual viewpointtrajectory, wherein the rendered picture under the virtual viewpointtrajectory is a picture for viewing the to-be-viewed area according tothe virtual viewpoint trajectory.